


Institutional Archive of the Naval Postgraduate School 


Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1996-03 


A hierarchical approach to multicast ina 
datagram internetwork 


Voigt, Robert J. 


Monterey, California. Naval Postgraduate School 


http://ndl.handle.net/10945/8465 


Downloaded from NPS Archive: Calhoun 


| Calhoun is the Naval Postgraduate School's public access digital repository for 
D U DLEY research materials and institutional publications created by the NPS community. 
sa Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS'‘s first 
KNOX appointed — and published — scholarly author. 


LIBRARY Dudley Knox Library / Naval Postgraduate School 
411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


NAVAL POSTGRADUATE SCHOOL 
Monterey, California 





DISSERTATION 


A HIERARCHICAL APPROACH 
TO MULTICAST 
IN A DATAGRAM INTERNETWORK 


by 


Robert J. Voigt 


March 1996 








Dissertation Advisor: Shridhar B. Shukla 


— 


) ve a - Approved for public release; distribution 1s unlimited. 


DUDLEY KNOX LIBRARY 
NAVAL POSTGRADUATE SCHCO} 
MONTEREY CA 93943-5107 





REPORT DOCUMENTATION PAGE 


~ Public reporting burden for this collection of intormation is estimated to average 1 hour per response, including the time reviewing instrucKIONS, Searching existing data sources 
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this 
cobector of mlotivaton, wielsdurg Suggeshons for edudiyg this Forder to Washington Headquarters Servives, Direcviovaie tov inlormation Cperalivvis aig Reports, T2715 veiierson 
Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC 20503. 











1. AGENCY USE ONLY (Leave Blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED 
March 1996 Ph.D. Dissertation 


4. TITLE AND SUBTITLE 5. FUNDING NUMBERS 


A HIERARCHICAL APPROACH TO MULTICAST IN A This research was partly 
DATAGRAM INTERNETWORK funded by the NSF RIA Grant 


9309316. 


6. AUTHOR(S) 


Voigt, Robert J. 





| 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 
Naval Postgraduate School 


Monterey, CA 93943-5000 


8. PERFORMING ORGANIZATION 
REPORT NUMBER 










9. SPONSORING/ MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/ MONITORING | 


AGENCY REPORT NUMBER 





11. SUPPLEMENTARY NOTES 
The views expressed in this dissertation are those of the author and do not reflect the official policy or 


position of the Department of Defense or the United States Government. 


12a. DISTRIBUTION / AVAILABILITY STATEMENT ; : — 12b. DISTRIBUTION CODE 
Approved for public release; distribution is unlimited 








13. ABSTRACT (Maximum 200 words) . — 
Multicasting in datagram internetworks enables multi-party interactions among users distributed over 


wide areas by eliminating duplicate packets in one-to-many and many-to-many communication. It requires | 
formation of a tree to distribute multicast data to the communicating group of members. 
Present multicast techniques need improvement in scope control, resource discovery mechanisms, and | 
tree construction to efficiently support a large number of global groups with dense as well as sparse 
membership. We deploy a hierarchy of clustered routers with the following features to make these 
improvements. Each group is assigned a scope level enabling access to resources at that level when 
members join and permits well-defined boundaries for scope control. The list of border routers and 
presence of groups at any level is maintained and supplied to members by a level-specific resource 
discovery mechanism called a registrar. To make tree construction scalable, the border routers determine 
the shortest inter-cluster paths to source clusters using the available unicast routing information, facilitating 
aggregation of router state for all senders in a cluster. Unlike the existing approaches, administrative 
configurationof the hierarchy eliminates the need for locating distribution centers dynamically. 


15. NUMBER OF PAGES 
163 
16. PRICE CODE | 


20. LIMITATION OF ABSTRACT 





























14. SUBJECT TERMS 
Multicast, hierachical, internetwork, registrar, scope control, clusters, 


CHARM, cluster-based 





19. SECURITY CLASSIFICATION 
OF ABSTRACT 


Unclassified 


NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) 
1 Prescribed by ANSI Std. 239-18 


Y CLASSIFICATION 18. SECURITY CLASSIFICATION 
OF REPORT | OF THIS PAGE 





Unclassified Unclassified UL 





UNCLASSIFIED 


SECURITY CLASSIFICATION OF THIS PAGE 


13. Wecharacterize the path length performance of the proposed hierarchy by providing an upper 
bound for the penalty as compared to source-specific trees. Simulation results for randomly 
generated topologies verify the worst case penalty and show the actual penalty to be significantly 
less. These results show that the proposed hierarchy can be deployed over the existing unicast 
routing infrastructure to achieve scalable multicasting with the required scope control while 
keeping the path length penalty bounded. The architecture described permits further 
improvements in the path length penalty if the identified enhancements to the underlying unicast 


routing mechanisms are made. 


SECURITY CLASSIFICATION OF THIS PAGE 
UNCLASSIFIED 


Approved for public release; distribution is unlimited. 


A HIERARCHICAL APPROACH 
TO MULTICAST 
IN A DATAGRAM INTERNETWORK 


Robert J. Voigt 
Commander, Unitéd States Navy 
B.S., United States Naval Academy, 1979 
M.S., Naval Postgraduate School, 1986 


DOCTOR OF PHILOSOPHY IN ELECTRICAL ENGINEERING 
from the 


NAVAL POSTGRADUATE SCHOOL 
March 1996 


c 


ih 
( 


a 





DUDLEY AINVA LIOKRKANT 
NAVAL POSTGRADUATE SCHOOL 
MONTEREY CA 93943-5101 


ABSTRACT 

Multicasting in datagram internetworks enables multi-party interactions among users 
distributed over wide areas by eliminating duplicate packets in one-to-many and many-to- 
many communication. It requires formation of a tree to distribute multicast data to the 
communicating group of members. 

Present multicast techniques need improvement in scope control, resource discovery 
mechanisms, and tree construction to efficiently support a large number of global groups 
with dense as well as sparse membership. We deploy a hierarchy of clustered routers with 
the following features to make these improvements. Each group is assigned a scope level 
enabling access to resources at that level when members join and permits well-defined 
boundaries for scope control. The list of border routers and presence of groups at any level 
is maintained and supplied to members by a level-specific resource discovery mechanism 
called a registrar. To make tree construction scalable, the border routers determine the 
shortest inter-cluster paths to source clusters using the available unicast routing 
information, facilitating aggregation of router state for all senders in a cluster. Unlike the 
existing approaches, administrative configuration of the hierarchy eliminates the need for 
locating distribution centers dynamically. 

We characterize the path length performance of the proposed hierarchy by providing 
an upper bound for the penalty as compared to source-specific trees. Simulation results for 
randomly generated topologies verify the worst case penalty and show the actual penalty 
to be significantly less. These results show that the proposed hierarchy can be deployed 
over the existing unicast routing infrastructure to achieve scalable multicasting with the 
required scope control while keeping the path length penalty bounded. The architecture 
described permits further improvements in the path length penalty if the identified 


enhancements to the underlying unicast routing mechanisms are made. 
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I. INTRODUCTION 


A. MULTICASTING IN AN INTERNETWORK 


1. Background 


The integrated packet-switched wide area networks of the future are expected to 
provide users with a variety of multi-party interaction capabilities. These services will 
benefit from multicasting for datagrams internetworks, pioneered in [15], permitting 
elimination of traffic caused by unnecessary packet copies in many-to-many 
communication. Multicast over internetworks typically requires the network nodes to form 
a routing tree based on the members of the group and their location. 

The capabilities that are enabled by such a multicast service will be useful in the 
following application areas: 

¢ Distributed databases 

¢ Interactive multi-party interactions such as video conferencing 
¢ Wide area distributed computations and repositories 

e Distance learning 

¢ Distributed interactive simulation 

For the multicasting over internetworks to be useful across these diverse application 
areas, it must support the following model of group interaction: dynamic group 
membership, ability of group members to reside anywhere on the network and best-effort 
datagram delivery as implemented in the present Internet. 

The Multicast Backbone (MBone) is a global multicast internetwork in operation 
today and is growing rapidly [48]. The MBone is overlaid on top of the present Internet and 
supports the above model of group interaction. It is an outgrowth of the first two Internet 
Engineering Task Force (IETF) “audiocast” experiments in which live audio and video 


were multicast from the IETF meeting site to destinations around the world [9]. 


2. Mechanisms 


A fundamental mechanism required for multicasting over datagram internetworks is 
the construction of a data distribution tree among group members. There are two basic 
approaches to this multicast tree construction. The first is a shared or center-specific tree 
(CST) and the other is a source-rooted or source-specific tree (SST). The center-specific 
approach utilizes a single tree, rooted at a suitable center, that is shared by all senders. In 
the source-specific approach, each sender builds a separate tree rooted at itself. A properly 
constructed CST consumes fewer network resources, such as router memory space and link 
bandwidth, for an interaction that contains multiple senders. The use of a center provides 
new members, that are not aware of who the current members are, a destination in the 
network to communicate with at the time of joining the interaction. Alternatively, the SST 
approach minimizes the path length metric such as the delay seen by receivers from each 
sender. When SSTss are used, a way for the new receivers to discover the current senders 
must be provided. Thus, the choice between CSTs and SSTs presents a trade-off between 
delay and tree cost, as illustrated in Figure 1.1. Both approaches must provide a way for 
new members a mechanism to discover resources, viz., the center in case of CSTs and 


senders in case of SSTs. [13][53] 
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(a) Example Topology (b) Lowest Cost (CST) (c) Lowest Delay (SST) 


Figure 1.1: SST vs. CST Trade-off 


The current [P-multicasting techniques use two standardized protocols called 
Distance Vector Multicast Routing Protocol (DVMRP) [52] and Multicast Extensions to 
Open Shortest Path First (M-OSPF) protocol [40] which construct SSTs. DVMRP is the 
predominant multicast protocol used in the MBone. In addition to the above protocols, two 


new protocols called the Core-based Tree protocol (CBT) [3] and Protocol-independent 


Multicast (PIM) [18] are under consideration by the IETF. The CBT protocol constructs a 
CST and the PIM protocol provides the end user with the option to join a CST or an SST 


with respect to a particular sender. 
B. CHALLENGES IN MULTICAST 


1. Tree Construction and Maintenance 


Approaches based on CSTs require the construction of a spanning tree for the 
members of a group in the network. The construction of an optimal spanning tree with 
respect to tree cost for a subset of a network is the Steiner Minimum Tree (SMT) problem 
(see Chapter II, section B.1). The construction of an SMT is a well-known NP complete 
problem [27]. As such, construction of minimum cost trees for dynamic groups in 
internetworks using heuristic solutions for the SMT problem is considered impractical. 
However, centers can be used to facilitate tree construction as well as resource discovery. 
Both PIM and CBT use centers, called rendezvous points (RP) and cores, respectively, as 
an aide to construct a spanning tree. The RP and core act as the root of a shared distribution 
tree. The RP also provides a new receiver with the current list of senders to facilitate the 
option of joining an SST. 

In their present forms, these protocols suggest that the centers may be selected 
administratively. The center clearly plays a vital role in the quality of the resulting trees. If 
the center is located far from the participants, a longer packet delay is experienced and 
excessive resources are consumed. If a good center is selected for a set of participants 
which later changes drastically, the center may severely impact the quality of the tree for 
the new participants. Finally, the probability that an administratively located center will be 
a good one for all participants decreases with the number of participants. [47}[51] 

After the trees have been constructed, two methods are used commonly for 
maintaining the tree state in routers, viz., “hard state” and “soft state.” These refer to 
whether or not router table entries for a tree are allowed to time out (soft) or if they must 


be explicitly removed (hard). The advantage of soft state is that the tree is constantly 


refreshed to take advantage of other lower delay paths as they become available. The 
disadvantage is that the tree must be refreshed or re-constructed periodically leading to 
additional processing overhead in the routers and a potential delay in the delivery of the 
first data packet following a refresh. The advantage of a hard state is that the tree is 
constructed and removed explicitly. Thus, it is not “data-driven” and does not cause 
processing delays once the tree is established. The disadvantage is that it is not possible to 


adapt the tree to the changing delay conditions in the network. [1] 


2. Scalability 


The two multicasting protocols deployed in the MBone which construct SSTs are 
Distance Vector Multicast Routing Protocol (DVMRP) and Multicast extensions to Open 
Shortest Path First (M-OSPF) routing protocol. Both are regarded as suitable for 
multicasting within a routing domain only as both protocols exhibit occasional network- 
wide broadcasting behavior. As a result of this broadcast behavior, DVMRP and M-OSPF 
are not scalable to inter-domain interactions. 

The explicit tree construction techniques in CBT and PIM remove this broadcast- 
behavior problem. However, as both use a center during tree construction, they have the 
scaling problem of locating a center. Due to the computational complexity of locating a 
center algorithmically for the inter-domain case, locating these centers administratively has 
been suggested. However, a single center located in this manner is likely to be a poor choice 
when group members span the entire topology. 

Another significant scalability problem in the inter-domain case is the aggregation of 
senders and groups. Aggregation refers to the maintenance of combined state 
corresponding to related collections of senders and groups in the intermediate routers 
keeping the total state maintained at a router manageable for a large number of senders and 
groups. A single shared tree which requires on-tree routers to maintain one (*, G) entry 
regardless of the number of senders to the groups provides the best case for sender 


aggregation as all the senders are aggregated onto one distribution tree. However, a single 


shared tree for a group spanning the Internet is not a scalable solution for either traffic 
concentration or path length performance metrics. The aggregation of groups is a more 
difficult problem which has not been addressed in any of the multicast proposals or 
implementations. This would encompass the concept of multiple groups sharing one link 


with a single routing entry for this “group of groups.” 


3. Scope Control 


Presently the “class D” Internet Protocol (IP) address is reserved for multicast (i.e., the 
most significant 4 bits of the address are 1110) [14]. Since the class D address is not bound 
to any specific part of the network in the Internet, it is considered to be a “flat” address. The 
difficulty with this approach is that any multicast, due to the network-wide validity of its 
address, becomes a potentially global multicast. It is considered undesirable to introduce 
Structure into the multicast address by the Internet community as it requires a change in the 
installed base of MBone applications and violates the ability of a sender to send to any 
group from anywhere in the network by simply knowing its address. 

Another difficulty presented by a flat address space is that there is no method to scope 
or restrict a multicast within a boundary based on the address alone. The present solution 
is to limit the “time-to-live” (ttl) field in multicast packets. The ttl field in the Internet 
Protocol (IP) has a time unit of seconds but specifies that every router must decrement the 
field by at least 1 [41]. Thus if this occurs, it actually specifies the number of hops a packet 
is allowed to travel before being dropped. This makes it difficult to determine the value to 
be inserted in the ttl field. As the Internet becomes more connected, a small ttl value may 


reach many more nodes than the originator intended. 


C. CONTRIBUTIONS 
In this dissertation, we have addressed the above three challenges facing network level 
multicast by developing a hierarchy of clusters of routers as the framework within which 


multicast distribution trees are to be constructed. We have developed a Cluster-based 


Hierarchical Architecture for Multicast, referred to as CHARM throughout this document 
and evaluated its performance with respect to delay. 

The general approach taken in this work is to make run-time tree construction, scope 
control, and resource discovery scalable by using an administratively configured hierarchy 
of routers. The overall contribution made by this work is to propose a hierarchical multicast 
architecture and show that the proposed hierarchy can be deployed over the existing unicast 
routing infrastructure to achieve scalable multicasting while maintaining a known worst 
case path length penalty and actual path lengths that are very close to the shortest paths. 

The specific contributions made by this work are listed below. 


¢ The use of a scope level for each group enables access to resources at that level 
when members join and permits well-defined boundaries for scope control. It 
also permits the group address to remain flat. 


¢ The list of border routers and presence of groups at any level is maintained and 
supplied to members by a level-specific resource discovery mechanism called 
a registrar. The organization and functions of this entity are described in detail. 
The need for such an entity has been stated[8], however, none of the existing 
multicast protocols has addressed it. 


¢ Tree construction is made scalable by making the border routers determine the 
shortest inter-cluster paths to source clusters, based on the available unicast 
routing information, facilitating aggregation of router state for all senders in a 
cluster. Existing protocols require construction of shortest paths to individual 
senders creating scaling problems. 


¢ The need for locating distribution centers dynamically, a major drawback of 
existing protocols, is eliminated by an administrative configuration of the 
hierarchy. This also permits incorporation of suitable management policies. 


¢ The path length performance of the proposed hierarchy has been characterized 
by providing an upper bound for the penalty as compared to source-specific 
trees. Simulation results provided for randomly generated topologies verify the 
worst case penalty and show that the actual path lengths can be made to 
approach those given by SSTs by sacrificing the granularity of scope control. 


¢ Finally, methods have been suggested to achieve further improvements in the 
path length penalty by identifying the enhancements required to the underlying 
unicast routing mechanisms. 


D. DISSERTATION OVERVIEW 


This dissertation is organized as follows. 

In Chapter II, we discuss current and related work in the area of multicasting for 
datagram internetworks and summarize the properties of different protocols with respect to 
the features desirable in a multicasting techniques suitable for global deployment. 

In Chapter III, we introduce CHARM and describe its components in detail including 
the actions taken upon joining of senders and receivers from the different parts of the 
hierarchy. We illustrate the concepts behind the organization using a running example of a 
small three level topology. We also identify all the assumptions we make in terms of the 
unicast routing required and the properties of the clustering assumed. 

In Chapter IV, we analyze the proposed architecture with respect to the path lengths 
obtained. We arrive at the most suitable approaches to route data inside the source cluster, 
inside the receiving cluster, and between the clusters by describing the alternatives and 
their implications in detail. Once the most suitable techniques are identified, we describe 
how the worst case path length penalty between any sender and receiver is guaranteed. 

In Chapter V, we describe how additional infrastructure in terms of hierarchical 
unicast routing and collection of a measure of goodness permit improvement in the worst 
case path length penalties in the receiving and sending clusters respectively. 

In Chapter VI, we describe the network overhead resulting from the deployment of 
CHARM in terms of the control traffic and the amount of state maintained at the routers. 

In Chapter VII, we provide path length results for three different types of multi-party 
interactions on two types of topologies generated using random graphs. An analysis of the 
results is provided showing exactly how the penalties are distributed among the senders and 
receivers as the size of clusters in a hierarchy varies. 


In Chapter VIII, we provide concluding remarks and suggestions for future research. 





Il. RELATED WORK 


In this chapter, we examine the fundamental work on which the existing multicast 
protocols and proposals are based. We then survey the protocols and proposals and evaluate 


them with respect to the requirements of a globally scalable multicast. 


A. TECHNIQUES 


Multicast protocols are based on the work done on multi-destination delivery by 
Dalal and Metcalfe [12] and center-specific trees by Wall [53]. The results and techniques 
reported in these have led to the development of protocols based on source-specific and 
center-specific trees. Other areas of research examine the distributed Steiner tree problem 


(22, 35] and solutions for using multicast in specific application areas[33]. 


1. Reverse Path Forwarding 


Dalal and Metcalf introduce the concept of Reverse Path Forwarding (RPF) as a 
routing method to reduce the number of duplicate copies of broadcast packets in a network. 
Their technique forwards a broadcast packet on all links except the one it arrived on, 
provided the one it arrived on is along the shortest path back to the sender. If it is not, the 
packet is dropped. Use of this method does not eliminate duplicate packets, rather it limits 
them and prevents the flooding from continuing indefinitely. 

Figure 2.1 shows how RPF works in a series of illustrations. Each graph shows an 
iteration of RPF for a broadcast by the node marked S. In this example, every node, except 
nodes c and d, receives two copies of the data, nodes c and d receive three. After only three 
iterations the broadcast has been delivered to all of the members and after four, it is 
complete. Using RPF, no additional information about the network other than shortest path 


distances between two nodes is required. No additional information is stored at a node and 


therefore no additional data structures are required beyond the standard unicast routing 


tables. 





(e) 
Figure 2.1: An Example of Reverse Path Forwarding 


The algorithm is considered to be practical for broadcast and optimal, in that data is 
delivered on the shortest path, if delays are symmetric on all links [12]. RPF is considered 
to be an efficient method for broadcast, where all nodes are involved. It is unsuitable for a 


subset of the nodes since nodes that are not intended recipients still receive packets. 


2. Reverse Path Multicast- 


Multicast capability has existed at the local area network (LAN) level using the LAN 
shared medium [28, 29]. Deering and Cheriton introduced multicasting as an efficient 
multi-destination delivery in an internetwork in Reference [15]. They also introduced the 
concept of an unknown destination delivery using a group address. The group address, 


similar to a broadcast address in that it has more than one intended recipient, restricts the 


recipients to some subset of the network. The reverse path multicast (RPM) work adds a 
number of refinements to the work done in Reference [12] making it more suitable for an 
internetwork. We examine the work done in References [13] and [15] as it is the basis for 
the present internet multicast protocols. RPM is presented as series of enhancements to 
RPF below. 

The first enhancement to RPF is that the group address identifies a host group to 
receive the packets sent to that address[10]. The sender need not know the membership of 
the group. Two types of groups are possible - open and closed. In a closed group, a sender 
is required to be a member in order to send to the group. In an open group, the sender is not 
required to be a member [15]. As in References [13] and [15], we focus on the less 
restrictive open group in this work. 

Deering and Cheriton specify extensions to two existing distributed unicast routing 
algorithms, namely, distance vector and link state. Distance vector routing requires that 
each node maintain the distance from itself to each possible destination. It does this by 
gathering information from its neighbor’s distance vectors[41]. In link state routing, each 
router keeps a complete map of the topology and computes routes to each destination. It 
does this by sending advertisements to all routers updating the network topology as it 
changes. Distance vector routing is described in References [26] and [37]. Link state or 
Open Shortest Path First (OSPF) routing is described in Reference [39]. The set of 
extensions to these unicast routing protocols have led to the Distance Vector Multicast 
Routing Protocol (DVMRP) and Multicast Extensions to OSPF (M-OSPF). One of the key 
features of these protocols is that they rely on the unicast routing tables stored in the routers 
to construct their multicast routing tables. 

The second enhancement to RPF is that multicast at the internetwork level stores 
multicast specific routing data in routing tables. At each router, these tables store a subset 
of the links upon which multicast packets for a group are to be forwarded. Similar to 
source-based forwarding discussed in Reference [12], shortest path trees are formed from 


each sender to all members of a group. This is accomplished using a data driven method 


1] 


which is described below. The method constructs the trees and fills in the routing tables 
based on where the data is not supposed to go. In essence, the data is flooded or broadcast 
out using RPF and then links are “pruned” back as they are not needed. 

The construction of the multicast tree is best understood in terms of the incremental 
improvements made to RPF. Reverse path broadcast (RPB) refers to RPF modified by 
identifying “child” links and eliminating duplicate data on LANs with more than one 
attached router [15]. Only one of the attached routers 1s designated to forward packets onto 
the LAN. A further modification called truncated reverse path broadcast (TRPB) stops the 
multicast packets from being forwarded onto a LAN in the case where no members of a 
group exist by sending a prune message back towards the source of the multicast [15]. This 
effectively prunes the “leaf” from this branch of the tree. In a final refinement, reverse path 
multicast (RPM), the prune message is in the form of a non-membership report (NMR) 
[15]. If intermediate routers receive NMRs from each of their outgoing links, they in turn 
generate an NMR and send it back up the tree towards the source. The NMR reports prune 
the tree back to only those routers who have members or are on the path to other routers 
who have members attached. 

The above strategy, used in DVMRP, exploits flooding and subsequent pruning to 
shape the multicast tree. This method constructs a lowest delay tree; however, the flood and 
prune strategy, with its occasional broadcast behavior, is not considered a scalable 


internetwork-wide solution. 


3. Center-specific Trees 


The center-specific tree (CST), an extension of the work by Dalal [11], proposes to 
provide a more efficient method of broadcast and selective broadcast (i.e. multicast) [53]. 
The work is based on the use of a minimum spanning tree (MST) as a delivery mechanism. 
An MST is a single minimum-cost tree which spans all nodes. The cost of computing an 


MST is a well-known graph theoretic problem and many solutions exist [6]. The problem 


of computing a minimum cost tree for a subset of the graph, is known as a Steiner minimum 
tree problem (see section B.1). 

In [53], Wall proposes a new technique for constructing a center-specific tree based 
on locating a center of the network called center-based forwarding. This method for 
constructing a single shared tree attempts to strike a balance between network cost and 
delay as discussed in Chapter 1, section B.1. Minimal cost and minimal delay cannot be 
achieved using one type of distribution tree [53]. 

Wall’s result is significant in that it shows the maximum delay bound of a center- 
specific tree to be twice that of a shortest path tree. This result is used in Reference [1] to 
build a multicast protocol known as Core Based Trees. The construction of the center- 
Specific tree does not use flood and prune but rather an explicit join mechanism with no 
flooding properties. An explicit join is characterized by a unicast message which results in 
a branch being added to a tree. The low tree construction and storage costs of this approach 
is potentially useful for a global internetwork-wide multicast for scaling purposes. We use 


the center-specific tree in the hierarchy to multicast non-delay-critical control messages. 


B. ISSUES IN TREE CONSTRUCTION 


Each of the above multicast techniques involves the construction of some type of 
delivery tree. We examine the issues related to constructing and maintaining trees for 


multicast. 


1. The Steiner Tree Problem 


As discussed in the previous section, a single shared tree among nodes provides for a 
low cost solution to either a broadcast or a multicast distribution problem. The difference 
between these two problems is that a minimum cost tree that spans all of the nodes 
(broadcast) is a minimum spanning tree, while the minimum cost tree spanning only a 
subset (multicast) is a Steiner minimum tree (SMT) [27]. 

The difficulty with constructing an SMT is that it is an NP-complete problem [31]. 


Several heuristic approaches have been suggested to find a good solution in References 


13 


[22], [27] and [35]. These approaches are not suitable for the multicast model of the 
internet, that of open groups and dynamic group membership [1]. Even the center-specific 
tree construction method developed in Reference [53] needs an initial center which 1s used 
as a focal point for constructing the tree. The location of a center with respect to the group 
members, affects the quality of the resulting tree [47]. While it is possible to get a good 
initial center, dynamic membership changes drive a need for greater flexibility that cannot 


be provided by these solutions. 


2. Tree Reconfiguration 


The heuristic approaches applied to construct a minimum cost shared tree assume 
some initial member distribution. It follows then that the quality of the tree will likely 
change as members join and leave the group. The work done in Reference [19] attempts to 
answer how the tree quality gets affected with membership changes. 

There are three potential solutions to handle dynamic membership. One is to re- 
compute the tree after each change, another is to make modest alterations to an existing 
spanning tree [54]. For example, in Reference [55], an algorithm was applied to modify the 
delivery tree dynamically by adding branches only when new members join and removing 
them only when members leave. Re-computing the tree after each change is an expensive 
solution leading to disruptions to members who remain in the group. A third solution is to 
build a sub-optimal tree which is resilient to change [19]. 

It has been shown that suboptimal trees do not perform too poorly [19, 54, 55]. They 
show that while there are spikes of poor performance for certain members of the group, 
overall, the average performance does not get substantially worse using either a slow re- 
configuration or a sub-optimal tree. 

These results are potentially useful for a hierarchical multicast that may require more 
than one tree to accomplish a multicast. They show that a periodic refresh is sufficient to 
maintain tree quality and frequent dynamic reconfiguration only increases the amount of 


processing and storage overhead while not improving the path lengths significantly. 


3. Center Location for Shared Trees 


The problem of locating a center for the center-specific tree construction protocols is 
examined in References [46], [47] and [51]. Prior work has focussed on evaluating 
performance of randomly placed centers. Center location is also discussed in References 
[1] and [53] but no working protocol is offered. In References [46], [47], and [51], it is 
shown that with some intelligent placement, the performance of the center-specific tree can 
be improved. These approaches again offer heuristic solutions to solve this problem since 
to find the optimal center is an NP complete problem. 

In the hierarchical multicast architecture we propose, we note the value of the center- 
specific tree from a tree-cost viewpoint. We believe that it is not practical, for a global 
multicast solution, to incur the potential path length penalty possible when a center-specific 
tree is used for the data distribution portion of the hierarchy. This penalty is particularly 
significant as the imposition of a hierarchy will incur penalties of its own. Furthermore, we 
show that the deployment of the hierarchy reduces the need for another mechanism for 


sender aggregation that is provided by a shared tree. 


C. EXISTING IMPLEMENTATIONS 


Existing multicast protocol implementations can be categorized as intra- and inter- 
domain multicast protocols. The primary distinction between the categories is the 
scalability improvements in the inter-domain protocols. Another important distinction is to 
make the inter-domain protocols independent of the underlying unicast protocols. The 


protocols to accomplish the two types of multicast are discussed below. 


1. Intra-domain Techniques 


The intra-domain protocols have been implemented and in use for several years. 


a. DVMRP 
The present IP multicast deployed in MBone is DVMRP [52] which is based on 


work done by Steve Deering [14] where the routers use the distance vector method for 


unicast route calculation. DVMRP constructs a source-specific tree for each sender in a 
group, by maintaining an entry for a source, group (S,G) pair. 

In DVMRP, the first multicast data packet is flooded out using RPM as discussed 
in section B.2 above. The tree is formed through pruning based on the NMRs of non- 
participating routers. This flood and prune strategy is the primary reason why DVMRP is 
not considered a scalable method for multicast tree construction. Flooding of data is not a 
desirable feature in an internetwork environment, because it creates unnecessary overhead 


for un-involved nodes. 


b. M-OSPF 


The approach based on OSPF unicast routing is called Multicast Extensions to 
OSPF (M-OSPF) and is described in Reference [40]. OSPF is a link-state routing protocol 
which provides routers a link-state database describing the network topology and updates 
the database through the use of advertisements. In M-OSPF, a new OSPF advertisement is 
added describing multicast locations [40]. Like OSPF link state changes, group information 
is broadcast across the network so that each router can maintain their link-state database. 

M-OSPF can only be run in a network running OSPF. All M-OSPF routers have 
a complete topology map of the location of group members. The broadcast nature of the 
link state advertisement and the storage required for a global membership information are 
the main reasons why this approach is not considered a scalable internetwork-wide 


solution. 


2. Inter-domain Techniques 
The primary changes for the inter-domain techniques is the addition of shared trees, 
unicast protocol independence, and the addition of explicit joins and leaves to reduce or 


eliminate the flooding behavior of the intra-domain protocols in tree construction. 


a. PIM 


Protocol Independent Multicast (PIM)[16] is one of the two draft standards being 
considered by the Internet Engineering task Force (IETF) to solve the scalability problem. 
The other is CBT, described below. PIM is designed to be independent of the underlying 
unicast protocol. The PIM architecture takes advantage of the existing hierarchy in the 
network and introduces two modes, dense and sparse. These two modes address the type of 
interaction desired relative to group composition and the size of the network. A dense group 
has a group membership that is “densely distributed across an internet” [17] and, 
conversely, a sparse mode group has a membership which “may span wide-area (and inter- 
domain) internets’[16]. Dense mode PIM (PIM-DM)[17] is similar to DVMRP and is a 
source-specific tree based protocol. This method was found to be undesirable for groups 
whose members are distributed sparsely across a wide area. Sparse mode PIM (PIM- 
SM) [16] uses a center-specific tree construction designed to address the scalability of its 
dense mode counterpart. The distribution center of PIM-SM is called a Rendezvous Point 
(RP). PIM-SM multicast tree construction also allows a hybrid mode, which includes both 
center-specific and source-specific trees, when the receivers request it [18]. 

PIM tree construction revolves around the selection of the rendezvous point (RP). 
All senders for a group must register with the RP in the network. Receivers requesting to 
join the group Set up the path from themselves to the RP. This is also how they learn about 
the senders in case they later desire to form a shortest path tree between them and eliminate 


use of the RP for that source and receiver pair. 


b. CBT 


Core Based Trees (CBT) is a protocol for multicast tree construction which also 
uses a center-specific or shared delivery tree [1]. CBT is similar to PLM-SM in that they 
both initially choose a center from which they build the tree, however, in CBT, multiple 
centers are allowed for one group. One of the primary differences between PIM(SM) and 


CBT is that CBT maintains “hard state” while PIM uses a “soft state.” A soft state approach 


is data-driven, trees are built as needed and state for the trees is allowed to time out. The 
hard state approach does not time out tree information after a period of inactivity. 

A consequence of the hard state is that multicast tree branches do not adapt to 
unicast route changes [1]. This is good for uninterrupted packet flow since the routes stay 
constant, but it may lead to sub-optimal branches on the tree. Also, soft state trees tend to 
have a higher join latency since the tree state may time out where a new branch might graft. 


We show that both of these approaches can be applied in our hierarchical multicast. 
3. Hierarchy-based Proposals 


a. Hierarchical DVMRP 
Hierarchical DVMRP (HDVMRP) is a proposed solution to address the 


exponential growth of MBone. It consists of a two level hierarchy which runs separate 
instances of DVMRP at the two different levels. The level 1 regions are composed of one 
or more Internet domains. The regions contain one or more boundary routers which are 
responsible for connecting regions. All boundary routers run two levels of multicast 
protocols where the level 2 protocol is an inter-region protocol. The result is that the 
protocol uses a flood-and-prune strategy at the inter-domain level. This yields the benefit 
of shortest path trees at the inter and intra-domain levels. [49] 

The primary problem is that the flood-and-prune nature of DVMRP continues to 
be the basic tree construction strategy of this protocol. The number of nodes involved in 
flood-and-prune is reduced in that the level 2 trees are only between border routers. 
However, in order to construct the tree, the first level 2 packets are flooded throughout a 
region. In addition, the encapsulation of packets at level 2 leads to duplication of packets 
inside a level | region. 

We view this proposal as a specific incremental improvement to an existing 
protocol. It solves some of the scaling problems of the MBone by only flooding regionally 
and less often than pure DVMRP. While the concept of regions is similar to the clusters 


proposed in this hierarchy, the use of the concept is quite different. We do not use any 


flooding-based tree construction and we do not allow the duplication of packets inside a 


cluster to arrive at the same destination. 


b. Hierarchical PIM 


Hierarchical PIM (HPIM) is a proposal to solve very specific problems with PIM, 
that of the advertisement of rendezvous points (RPs) to group members and mapping RPs 
to groups [24]. The proposal consists of a control hierarchy for RPs in which the RPs 
themselves are structured in a hierarchy, not the members. 

In HPIM, the members join towards the lowest level RP. If the join is meant to go 
to a higher level scope, then that RP forwards the join request to the next level RP. The RPs 
are structured in a hierarchical fashion to allow for increasing scope of the multicast. In 
PIM, the senders unicast to the RP which then multicasts out on the shared tree to all 
receivers. In HPIM, the data is forced to flow along the RP tree from RP to RP at each level. 
At each RP it is de-encapsulated, checked for receivers and then re-encapsulated and 
forwarded on to the next level RP. If receivers exist at a level, the RP forwards the data out 
onto the tree. The problem this solves for PIM is the case of the nearest RP which is not 
local but the group 1s local. The flat version of PIM forced multicast traffic outside of the 
users desired scope to get to the nearest RP. The advantage of PIM (and HPIM) is that 
members who have bad paths via their RP for senders who have a high data rate have the 
option of switching to a shortest path tree. 

We view this proposal as a specific incremental improvement to an existing 
protocol. It solves the problem of PIM RP discovery, but it introduces a new level of 
complexity and a new set of problems, primarily that of the data forced to flow along what 


may be a very sub-optimal path from RP to RP. 


c. Hierarchical Multicast 


A hierarchical multicast routing algorithm is proposed in Reference [57] which 


included clustering the network into a two level hierarchy. The clusters are the first level 


lhe; 


of the hierarchy. One of the nodes of each cluster is selected as the local core. The cores 
form the second level of the hierarchy. 

The cores form a shortest path tree rooted at themselves to all of the members in 
their cluster and another to all other cores. Senders send traffic to the core which multicasts 
it out on both levels, one for local traffic and one to the other cores on the tree. The authors 
call this a group oriented multicast routing (GOMR) protocol. 

GOMR performs better for smaller cluster sizes but has a problem with scalability 
since it only supports 2 levels. The larger cluster sizes scale better but there is a trade-off 
in end-to-end delay. Three separate protocols are introduced in Reference [57], of which 
one is for hierarchical multicast tree construction. The others are for set-up and 
maintenance of the hierarchy. 

The first protocol determines neighbor sets to dynamically cluster the network. It 
does this through the broadcast of a probe message to determine how far away members 
are allowed to be from the core. This, a opposition to the resulting multicast, does not scale 
for large cluster sizes. In addition, we have seen that broadcast behavior, even a limited 
broadcast behavior, is undesirable in an internetwork. Finally, the amount of overhead in 
time to compute the clusters is linear with the size of the group. This 1s an unacceptable 
Start-up cost 

The third protocol is used to adjust the cores once the group is established to 
insure fairness among the members of the group. This requires all members of a group to 
know all clusters and the local cores [57]. Again this 1s a scaling problem for a large number 
of small clusters, which are required for better performance. 

The tree construction protocol simulation shows good _ performance 
characteristics for large numbers of small clusters, however, this solution does not scale. 
Large clusters have both a scaling and performance problem. The fact that it only supports 
2 levels makes it unsuitable for scope control with a finer granularity. Our proposal permits 
a multi-level hierarchy with similar performance traits and greater flexibility in clustering 


because of the multiple levels. The main advantage is that we do not introduce the 
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broadcast behavior, high start-up costs, and performance problems of a single entry and exit 


point for a cluster which is required by GOMR. Also, we do not have the inherent 


scalability problems that this proposal has. 


D. EVALUATION OF EXISTING APPROACHES 


Table 2.1 lists the criteria for evaluating the above proposals and implementations for 


scalability, scope control, center location, use of a flat address space, and aggregation. 


Table 2.1: Desirable Attributes of a Multicast Protocol 


Attribute Interpretation 


Scalability (S) 


Scope Control (SC) 


Path Length (PL) 


Center Location (CL) 


Flat Address Space (F) 


Aggregation (A) 


Handles many dense as 
well as sparse wide-area 
groups 


Well-defined boundaries 
for data and membership 
forwarding 


Comparison of paths gen- 
erated with the shortest 
path source-specific trees 


Protocol requires a center 
for resource discovery or 
tree construction 


Every multicast can be glo- 
bal in scope 


Routers can reduce the 
State maintained by com- 
bining many individual 
entries 


ZA 


Requirements for any inter- 
net level multicast protocol 


No overhead for un- 
involved nodes 


Primary metric for perfor- 
mance characterization 


Impacts the performance of 
the resulting tree 


Addresses cannot be used 
for scope control by them- 
selves 


Permits handling of send- 
ers and/or groups without 
excessive memory/update 
requirements 





In Table 2.2, we show how each of the multicast schemes can be characterized by the 


above attributes: 


Table 2.2: Summary of Existing Multicast Protocols 


‘DVMRP | Shortest | Shortest paths via | via 
multiple trees 
/M-OSPF | /M-OSPF _ | Requires | Requires OSPF ae we 


Shared tree/SST 
Hybrid 


Shared trees with 
multiple cores 
HDVMRP _} Hierarchical ver- 
sion of DVMRP 
HPIM Hierarchical RP 
selection. 


GOMR 2 Level Cores 





X = not supported, YW = supported/support possible, “” = unknown, -- = not required 


Some of the above table entries are proposals, such as HPIM, HDVMRP and GOMR. 
As aresult, the exact status of any particular point may not be known at this time. 

The initial multicast solutions do not scale due to broadcast behavior in the protocols. 
The proposals to address scalability have introduced new problems of their own. For 
example, since they rely on center-specific trees to solve the broadcast problem, they have 
introduced a center location problem. None of the proposals has provided complete 
solution to the scope control problem beyond an administratively scoped address which 
imposes a simple two level hierarchy through address filtering. None of the solutions has 
proposed aggregation of groups to simplify handling of a large number of groups and only 


CBT permits aggregation of senders. Several of the above proposals are beginning to 


examine the use of a hierarchy, but only as an incremental improvement to problems in 


existing solutions. 


E. Focus of Present Work 


The proposed cluster-based hierarchical architecture solves many of the problems 
discussed in the previous section. It does not propose an entirely new protocol suite from 
the ground up. Rather it exploits the existing protocols in a structured hierarchical 
architecture. 

We have developed this cluster-based hierarchical architecture for network level 
multicast (CHARM) and evaluated its performance with respect to path length. We note 
our basic objectives and focus areas. 


¢ Scope Control - To include acomprehensive solution for controlling the scope 
of data flow as well as flow of membership information. 


¢ Performance - The worst case path length penalty must be known depending 
upon the scope of the group. 


e Scalability - It must be possible to perform sender aggregation and facilitate 
group aggregation. 


¢ Center Location - There must not be a need to locate centers dynamically and 
in a group specific manner. 


¢ Flat Address Space - The group address must remain flat and the scheme must 
permit use of name and address resolution schemes. 


25 


24 





Il. CHARM: A CLUSTER-BASED HIERARCHICAL 
ARCHITECTURE FOR MULTICAST 


We have noted in the previous Chapter that a hierarchy can be used for multicasting 
in internetworks to address scope contro! and scaling issues. For illustration, we draw an 
analogy with the hierarchy in the postal delivery system. Using the initial digits of the ZIP 
code, a letter can be quickly identified to have a local delivery destination. Thus, the post 
office can process mail locally instead of regionally and the scope of the letter can be 
quickly identified. The scaling issue is addressed by the ability to have many local offices 
which relieve the load on the regional! offices. These in turn, relieve the load on the national 
distribution points and so on. In addition, the intermediate routing points need not know the 
exact location of the final destination, only its approximate location. The finer detail of the 
address, such as the street number, is not required until the letter gets near its destination. 
This simple example shows how a hierarchy is deployed to address the issues of scalability 
and scope control. We note, however, that the ZIP code represents a hierarchical address 
and in multicasting over internetworks, we need to maintain the non-hierarchical nature of 
the group address. The ZIP code system bears a strong analogy with hierarchical unicast 
routing. We begin this Chapter with a description of such a scheme and relate it to the 


proposed scheme for multicasting. 


A. HIERARCHICAL UNICAST 


The hierarchical unicast solution proposed in References [21] and [32] is primarily 
aimed at reducing the number of routing table entries and the number of updates required 
to keep the tables current. Given the growth of the Internet at the time, it was determined 
that storage and updates of routing information would soon become prohibitive in a flat 
addressing scheme. We explain this solution briefly below. 

Consider a network of N nodes where each node has a routing table. The unicast tables 
have a number of entries that contain the destination address, the delay to that destination, 


the outgoing interface and the hop count. The outgoing interface, or next node entry, is the 
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link upon which traffic is forwarded to get to the destination along the shortest path. The 
shortest path normally refers to the minimal delay path, not hop counts. Hop counts are 
used to determine link or node failures in the network. The problem addressed by the 
hierarchical solution of [21] and [32] is the following: given a fixed number of nodes in a 
network, N, some number of levels in a hierarchy, m, and some number of nodes at each 
level, m, minimize the number of routing table entries for each router in that network. 

In Figure 3.1, we show a simple flat topology with N=18 nodes to illustrate the 
reduction in the number of router table entries when a hierarchy 1s used. We assume that a 
distributed routing algorithm, such as [26], allows for the exchange of routing information 
between neighbors. When this information has been exchanged for some period of time, 
each of the nodes contains an entry in its routing table for each of the other nodes. This 
entry contains the information about the shortest path to each destination. If we include a 
self-entry, then each node contains 18 entries in its unicast routing table. The size of this 


table in a flat topology is O(N), which in a large network, would be prohibitive to maintain. 


Figure 3.1: A Simple Flat Network Topology 


The solution to this problem is very similar to the mail delivery analogy mentioned 
above where the postal address is a unicast destination. The unicast destination is identified 
by asingle address, however, its address can be aggregated with others who are in the same 
destination network. The imposition of a hierarchy on a flat topology as shown above 
implies clustering of nodes. Clusters permit aggregation of individual addresses inside a 
cluster into one address, thereby reducing the number of entries in the routing tables of the 


individual nodes. A clustered version of the above topology is shown in Figure 3.2. The 
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clusters are numbered using a decimal notation system. Nodes are defined as 0" level 
clusters. 0" level clusters are grouped into 1*' level clusters which in turn are grouped into 


274 Jevel clusters. The top level cluster contains all of the nodes in the network. 







O11. Cluster 1.1 ie Cluster 1.2 






Cluster 1 






@ MBR = Multicast Border Router 
Figure 3.2: Imposition of a Hierarchy on a Flat Network Topology 


A sample router table for node 1.1.2 of cluster 1.1 in Figure 3.2 is shown in Table 


3.1 below. 
Table 3.1: Sample Unicast Routing Table [32] 


min St we | 





0” Level 


Cluster Entries 











Self Entry 
a 
——— Self Entry | 284 Level Cluster Entry | 


Ze 






1°' Level 






Cluster Entries 





Note that the routing table now has 10 entries instead of 18 since all addresses for 
clusters 1.2 and 1.3 are aggregated into one entry each. Given this example, it can be seen 
how a simple hierarchy saves almost 50% of a unicast table’s size by aggregating the 
addresses. 

References [21] and [32] show, that through clustering and address aggregation, 
significant savings in routing table space can be achieved at the cost of a path length 
penalty. By routing traffic to a node based on its destination cluster, traffic for all nodes in 
that cluster follows the same path. This may not yield shortest paths for all members in the 
destination cluster. In addition to minimizing the router table size, References [21] and [32] 


also examine this path length penalty. We start by stating their assumptions. 


1. Assumptions for Hierarchical Unicast 


The path length penalty examined in References [21] and [32] is based on several 
assumptions resulting from the routing table minimization problem. In what follows, graph 
G represents the network N under consideration. 

Assumption 3.1: Graph G is a connected graph with a set of vertices, V, 
connected by a set of edges, E. This graph is not directed and the weight on every edge of 
the graph is unity. Therefore, G=(V, £). Grepresents the network, N, and vertices represent 
the network “nodes” or routers. 

Assumption 3.2: There exists, at each node in the graph, a routing table which is 
assumed to contain shortest path routing information for the other nodes in the graph, 
including the outgoing interface to reach a destination node along that path. This is the 
unicast routing table. . 

The assumptions that follow have been made to permit formulation of the routing table 
S1Ze Optimization in Reference [21]. 

Assumption 3.3: The underlying m-level hierarchical clustering structure of the 


network nodes is such that all clusters at level k, C,, are of equal degree, n,, k=1,...,m. The 


degree of a kK” level cluster, C, is defined as the number of k-1‘' level clusters included in 
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C,. Also, the subset of nodes composing a cluster at any level and their incident edges 


constitute a 1-connected cluster subnetwork (at /east one path exists between any pair of 
nodes). 

The two implications of Assumption 3.3 are that clusters are of equal size at the same 
level and that a node cannot belong to a cluster if it does not have at least one path, internal 
to that cluster, to another node in the cluster. In the work we present in this thesis, sizes of 
clusters need not be the same. We do consider the 1-connectedness of clusters essential to 
our architecture (see Section C.2). 


Assumption 3.4: The diameter, which is the maximum shortest path between any 
pair of nodes, of any K” level cluster is less than or equal to a quantity diam(C;), k= 1,...,m. 


diam(C,,,) represents the diameter of the entire network, diam(N) and 
diam(C,)>diam(C,_,)>0O Wk (aialp) 


This assumption states that, based on Assumption 3.3, when all of the clusters at 
the same level are the same size, it is not possible to have a parent cluster in the hierarchy 
with a diameter smaller than any of the clusters at the level below. We find, through 
experimentation on random graphs that are more representative of the internetworks today, 
that this is not necessarily true and may be difficult to enforce on a real internetwork. 

Assumption 3.5: A cluster at any level k = 1,2,...,m contains a shortest path 
between any given pair of nodes which belong to that cluster. 

This assumption states that traffic routed between two nodes in the same cluster 
must be routed internally to that cluster and that the internal routing contains the shortest 


path between these two nodes. 


2. Kamoun and Kleinrock Result 

Using the above assumptions, Kleinrock and Kamoun show that large savings can be 
gained by introducing a hierarchical clustering structure into the unicast routing problem. 
The length of the routing tables can be reduced from N entries, to e In N entries[32]. The 


path length penalty to be paid for this saving can be as much as two times the shortest path. 


29 


Further, that in the case of hierarchical unicast, the path length penalty approaches zero as 


the size of the network goes to infinity. This is described as: 


h/h— | 
NoOo> eZ 
l/N 70 


where f, is the path length in the hierarchy, A is the shortest path and / is the router 


table length. Thus, a significant reduction in router table entries with essentially no increase 


in path length for large networks is achieved for large networks. 


B. UNICAST vs. MULTICAST USING A HIERARCHY 


The relationship between flat and hierarchical multicast is different from that between 
flat and hierarchical unicast. Group addresses are global or “flat,” and the address 
effectively aggregates all members of a group into a single address. Thus, the only savings 
in router table space comes from source address aggregation, unlike the unicast destination 
address aggregation. Consider the postal delivery analogy. We now have one ZIP code 
which translates to several destinations that are not geographically close. As a delivery 
mechanism, the post office does not even know who or where the receivers are unless the 
receivers subscribe to the ZIP code or the sender supplies a list of receivers. Simply using 
a ZIP code-like mechanism appears insufficient for multicast. A better analogy for 
multicast is cable television. The cable company has one cable on which information from 
many sources is sent to many receivers. It is the receivers’ responsibility to select the 
desired channel. Aggregation is achieved by grouping many sources together on a single 
distribution channel and letting the receivers sort out which senders they wish to receive. 
Along the way, some senders’ data streams may be diverted to other links or dropped if no 
longer subscribed to. Thus the need for the receiver’s responsibility in the multicast case 1s 
an important point of distinction. 

In unicast, the hierarchy permits a node to keep information in its tables about those 
nodes in close proximity while aggregating information about destinations further away. 


The group address and its intrinsic aggregation of receivers’ addresses is “optimally 
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aggregated” for a single group using a center-specific tree. As such, adding members does 
not increase the routing storage requirements for this group since entries are stored per 
group. Making multicast addresses hierarchical implies a different and unique address for 
a group at each level in the hierarchy. Any increase in the number of ways to address a 
group, which leads to more addresses per group, results in an increase in router storage 
requirements for the group. It has been stated that, due to a lack of structure in present 
multicast addresses, group aggregation is not feasible [1]. As any two groups are likely to 
have membership that is in non-overlapping parts of the network, it is not known how 
group aggregation could be handled even with hierarchical group addresses. The dynamic 
nature of groups makes group aggregation even more difficult. Although it appears feasible 
to aggregate groups if the address were structured hierarchically, by itself it does not appear 
sufficient for multicasting. Thus, the aggregation deployed for multicast needs to be 
different. 

Scope control, one of the motivations for this work, also works differently for 
multicast. Since group addresses are to remain global, use of addresses to perform implicit 
scope control, as in the ZIP code, is not possible. The only implicit method, the use of the 
ttl field is unrealistic as it cannot be applied in a structured fashion. Thus, explicit scope 
control appears to be the only practical solution. By assigning a scope level at group 
creation time, we attempt to save on router table space for those nodes who have nothing 
to do with the multicast. In addition, we obtain scaling benefits by keeping local traffic 


local. 


C. TERMINOLOGY 


We decompose our network into a vertical arrangement of subsystems called clusters. 


We first define the terms below. 
1. Network 
Definition 3.1: A network is a connected graph, G = (V, E), with V = {vj}, vo, 


Vevey and & = {15 C9, Caesey| st 


ai 


Definition 3.2: A cut-set of a connected graph is a set of edges whose removal 
would disconnect the graph [23] 


Definition 3.3: A cut of a network, N, is a cut-set of the underlying graph, G. A 
cut partitions V into two subsets, P and P suchthat PAP = @ 
end tem a2 5|- 

The physical interpretation of the graph is of an internetwork in which nodes represent 


routers. 
2. Clusters 
Definition 3.4: A cluster, C, is asubset of V corresponding to a cut of anetwork N. 


Definition 3.5: All nodes of C with edges to nodes of G not in C are defined as 
border routers of C. 


Definition 3.6: Nodes of G are defined as level 0 clusters. The set Vg=V of all 
level O clusters and set of Eg = E of edges between them define 
graph Go. Note that Go = G. 


Definition 3.7: A level n (n> Q) cluster, C,,, is acut in a graph, G,_;, whose nodes 


are level n-1 clusters and whose edges, e’ (e’ CE), connect 
level n-1 clusters. 


Definition 3.8: Border routers of C,, are level n border routers. 


The physical interpretation of a level 1 cluster is a routing domain. Border routers of 
level 1 clusters are domain border routers [43, 44]. A level 1 cluster is also referred to as a 


leaf level cluster. 


3. Logical Hierarchy 

The proposed hierarchy is an m level hierarchy, where m 1s the highest level and 1s the 
entire internetwork. Clusters range from 0 to m-1 where C, denotes a k level cluster. C,(s) 
is the k” level cluster which contains the node s [32]. The connections between both nodes 


and clusters are links. For the purpose of measuring the path length between two nodes in 


Be 


the network, each link is counted as unity, the equivalent of one hop count. We illustrate 
the logical hierarchy in Figure 3.3 
While the logical hierarchy can be described as a tree, it does not imply that network 


data flows on the links of this logical tree. 


Level # 


m-2 


k+1 


k-] 





Co @ e e 
Figure 3.3: Logical Tree Representation of a Clustered Network. From Ref. [21] 


a. Leveln 

Level n refers to a level in the logical tree at which the different multicast 
components exist and operate (i.e. level clusters as defined in Definition 3.7). The prefix, 
n, implies that the components are at the same level with a common parent at level n+1. 


There are several level n components which are described individually below. 


Do 


4. Groups 

Groups are a collection of member nodes who either wish to send or receive multicast 
data as defined in Reference [13]. The members of a group are end systems on a local area 
network (LAN) and are represented by a designated router (DR) for their LAN. The DR is 
a level 0 cluster. A group’s scope 1s determined by the expected spread of the locations of 
the members. The group level is the same as the scope level and refers to the level of the 
lowest common cluster in which all group members are included. The scope of the group 
is determined at the time of the group’s creation. Groups with different scopes are 
illustrated in Figure 3.4. 


Definition 3.9: A level n group is a group whose members belong to the same 
parent cluster C, and whose scope is level n. 
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Figure 3.4: Sample Groups in a 3 Level Hierarchy 
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5. Multicast 

Multicast refers to the actual data distribution. This assumes that the data distribution 
paths have been set up for proper routing of the traffic. A multicast-capable router is one 
with the ability to perform multicast operations. 


Definition 3.10: A level n multicast is a the distribution of data to members of a 
level n group. 


6. Multicast Border Router (MBR) 


Definition 3.1]: A level n multicast border router (MBR) is a multicast-capable 
level n border router where a level n border router is defined in 
section C.2 above. 


A multicast border router, A, at level m is denoted as A,. 


D. ASSUMPTIONS FOR HIERARCHICAL MULTICAST 


Before we can explain the details of the multicast hierarchy, additional assumptions 
are stated regarding the underlying network and the services it provides to multicast. These 
assumptions are based, in part, on existing or proposed implementations. 

The distance and delay information used for the proposed hierarchical multicast is 
based on the assumption that there exists a unicast routing mechanism which supplies the 
shortest path between any pair of border routers. Thus: 

Assumption 3.6: There exists an underlying unicast routing mechanism based on 
domains and border routers such that a border router knows the shortest path to any other 
border router [43]. 

The above assumption 1s based on present Internet terminology and implementations. 
The following is based on proposed implementations: 

Assumption 3.7: All Border Routers are Multicast Border Routers (all are 


multicast-capable) 
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An interim solution prior to Assumption 3.7 being implemented is the use of tunnels. 
Tunnels encapsulate multicast traffic in a unicast message and use unicast protocols to send 
data between multicast-capable routers[49]. 

Note that we also adopt Assumptions 3.3 and 3.5 of the hierarchical unicast case, 
the clusters are connected and the shortest path between two nodes in a cluster lies within 


that cluster. 


E. CHARM FUNCTIONAL COMPONENTS 


In this section, we provide a functional description of the CHARM components. In 
those cases where it is appropriate, an algorithmic description of the components behavior 


is described. 


1. Clusters 

Clusters, defined in section C.2, play specific roles in a given multicast. A cluster can 
be a sending cluster, receiving cluster, transit cluster, or any combination of the three. Note 
that group members can be senders only, receivers only, or both. 


Definition 3.12: A sending or source cluster for a group is a cluster with 
members who send data to the multicast group. 


Definition 3.13: A receiving cluster is a cluster with members who receive 
multicast data. 


Definition 3.14: A transit cluster is a cluster with no members but lies along the 
shortest path between two clusters such that multicast data 
transits through one or more of its border routers. 


2. Registrar 

When a group is created, it is advertised by an agent responsible for group registration. 
This agent is known as the registrar. The registrar is responsible for doing group name and 
address resolution. This function is presently carried out by the session directory(sd) tool 


(30Jand is described by the session directory protocol [25]. The sd tool, however, is not 


36 


hierarchical. The registrar we propose is level-specific and performs some additional 
functions. 

A registrar at level n 1s expected to be a distributed replicated directory service for a 
level. Each cluster at every level has a “local” registrar. The level n registrar consists of the 
local registrars for the clusters at this level. All the registrars at one level maintain 
information about groups at that level in a replicated fashion in that the information for 
level n is found in each of the local cluster registrars at level n. The level n control tree, a 
shared tree connecting all the registrars for clusters at one level with a common parent, is 
used to pass the information between local registrars. 

We describe the function of the registrar to support the hierarchy in the following 
ways: 


¢ to advertise groups and associate a group address with a group name for a 
particular level, 


¢ to be a member of the control tree at its level, 
¢ to be a member of the all-MBR tree in its cluster, and 
¢ to store and disseminate, when requested, the MBR information for its cluster. 
The information the registrar maintains facilitates group creation, sender registration 
and receiver join. The registrar is structured such that level n information 1s kept at level n. 
Figure 3.5 shows an example of registrars at a level. The registrars for a level make up 
the level n control tree (see section E.4). Within each of these clusters, there may be one or 


more leaf level clusters which also have their own registrar. 
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Figure 3.5: Registrar Control Tree for a Level 
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Leaf level registrars know the addresses of each parent registrar in its part of the 
hierarchy. Level n registrars do not need to know the addresses of their leaf level registrars. 
The registrars at the leaf level communicate with their level m parent registrar using unicast 
routing. Any unicast messages that are used for retrieving level m MBR information must 


include a return address. Each of the registrar functions 1s described below. 


a. Group Creation 

A request to create a group is made to the leaf level registrar. The leaf level 
registrar is available on a well known multicast address within a leaf level cluster. Based 
on the requested scope of the group, say level n, the registrar forwards the group creation 
request to the local level ” parent registrar. The level 1 registrar then advertises the group 
at its level, sharing the existence of this group with all other level n registrars. 

This process is illustrated in Figure 3.6. The originator’s group creation request is 
sent to the leaf level registrar. The leaf level registrar, R,,, forwards this group creation 


request, with its own address as a return address, to the level n registrar R,. R, returns the 


group address and distributes it to the other registrars at this level, as shown, on the control 
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tree. The other registrars use this to make an entry in their group membership tables which 


are used when members of this group appear in the leaf level clusters below them. 


“ ~ 






7 \ 
en Pe level n ! e RK, 
a“ eS SS a ~ ee / 
Pia ee ~ S\N Sree N , 
7 “O o o XN G 
,’ Originator e Vy \ 
/ / os \ e ie so - ; r= - 
/ : { ‘ 
, 1 1. Create Group ae a ; ee 
i 4 4. Group Address! \ ana 
an forwarded | \ ae \ ; 
I : Rae me A a ‘ne 
! . a / R > 
es leaf level - io fc 
“ N 4 


~ es 
~ 
een 


~~ \3a. Group Add 
2. Create Group Dae oy 
forwarded 


Figure 3.6: Registrar Actions for Group Creation 


b. Sender Registration 
Senders to a group at any level send the registration request to the closest leaf 
level MBR. A sender obtains the list of leaf level MBRs from its leaf level registrar. Once 
a sender obtains a leaf level MBR list, the closest MBR is cached for future use. 
The leaf level MBR forwards the registration to its closest level n MBR. The level 
n MBRs are available to the leaf level MBR from the leaf level registrar. The leaf level 
registrar gets the level n MBR list from the level n registrar, prior to any group creation 
requests. Level n registrars maintain a list of level nm MBRs, for all n, as part of the static 


configuration of the hierarchy. The leaf level MBR is expected to cache its closest level n 


MBR for each level n, for n> 1, Vn. 
Once the level n MBR receives the new sender registration, it forwards it out on 
the all-MBR tree, a shared tree specific to each level n cluster that connects all level n 


MBRs of that cluster and the local level n registrar (see section E.4.b). The level n registrar 
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being a member of this tree, distributes the presence of this sender on the control tree, of 
which also it is a member, in the form of a new source cluster notification message. The 
new source cluster notification contains the level m MBR list from this cluster. It is intended 
to initiate a Designated Border Router (DBR) election, as described in section E.3, in the 
other clusters at this level. The registrar need only do this once for all groups in a source 
cluster, that is, when the first sender for any group in this cluster appears. After a DBR is 
elected, only the source cluster and group address need to be passed, the DBR is valid until 
all sources have left and it is explicitly pruned. Each group address stored with a level n 
registrar is tagged with a source bit. After a sender joins a group, the source bit is marked 
as active. This source bit is reset when the last sender in this cluster leaves, enabling the 
registrar to initiate a DBR election in other clusters at this level when a sender reappears. 
Thus, the leaf level registrar interaction for a sender join accesses a cached MBR 
entry at a sender, a level n registrar accesses a cached MBR entry at a leaf level registrar 


and a source cluster notification is sent on the control tree. 
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The sender registration process 1s shown in Figure 3.7 with the sequence of events 
identified. It is important to note that once events | and 2 have occurred, the information is 


cached such that they will not occur for subsequent registration from this node. 
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Figure 3.7: Registrar Actions for New Sender Registration 


c. Receiver Join 


A potential group member its leaf level registrar for the existence of a group. The 
leaf level registrar queries the appropriate level 1 registrar. This information is provided 
only in response to a request and should not be cached as groups are dynamic. 

Level nm MBRs of a cluster register with their level ” registrar. They do this when 
they form the all-MBR tree for a cluster by having the local registrar act as the center of 
this tree (see section E.4.b). The level m registrar maintains a list of its level nm MBRs, which 

is Static information that changes only when clustering changes or MBRs fail/recover. 
When a receiver queries about a group’s existence from the leaf level registrar, it is 
provided a list of MBRs for its local level n cluster at the level of the group. The receiver 


uses this information to look up the closest MBR to join for externally sourced traffic. 
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The information flow for a receiver join 1s shown in Figure 3.8. Leaf level 
registrars are also expected to cache MBR entries for levels which have group activity. 


Thus, steps 2 and 3 do not occur once this information is cached at the leaf level. 
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Figure 3.8: Registrar Actions for New Receiver Inquiry 


3. Designated Multicast Border Routers 


An MBR of a receiving cluster can become a designated MBR (DBR) for a source 
cluster. A DBR is specified for a receiving cluster as the entry point for multicast data from 
a given source Cluster. Note that the DBR status of an MBR is cluster-specific and not 


group-specific. A single MBR can act as a DBR for more than one source cluster. The DBR 


election process is described in Figure 3.9. Ties are broken arbitrarily by selecting the MBR 


with the lowest network address. 


DBR Election at a leveln MBR 


Receive list of source cluster MBRs from the registrar 
Calculate distances to source cluster(s) MBRs 
Multicast my distances on my All-MBR tree (see section E.4) 


If ( my distance is lowest ) then 
If (Ihave receivers attached ) then 
Join the source cluster MBR 
else 
Register source cluster in active (SC,G) table 
Notify all MBRs in my cluster that I am the DBR for this (SC,G) 





Figure 3.9: Algorithm for DBR Election 


The initiation of a DBR election is in response to a new sender appearing somewhere 
in the network. If a new sender appears for an existing group and no DBRs have been 
chosen for this cluster, then the DBR election takes place. This occurs regardless of 
whether receivers exist for this group. If there are no receivers in a cluster, the (source 
cluster, group) (SC,G) pair is stored at the DBR and the DBR for this group is stored at all 
other MBRs until a receiver joins. When the first receiver joins a group for this source 
cluster by joining its closest MBR, the tree construction begins. If the MBR is not a DBR, 
it joins the DBR. When the DBR receives a join for this group, signifying the presence of 
a receiver in its cluster, it joins the source cluster. 

The registrar in a source cluster starts the election process by sending a new source 
cluster notification message on the control tree which contains the source cluster MBRs, 
the group address, and the source cluster ID. The registrar learns of new senders in the same 
way that level »n MBRs do (see Figure 3.7). During the DBR election, each of the MBRs 
exchanges its shortest distance to some MBR on the source cluster with other MBRs on its 
ALL-MBR tree. The MBR that elects itself as the DBR forwards out a confirmation on the 
all-MBR tree. MBRs keep track of the list of DBRs for their groups and the DBRs keep 
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track of the (source cluster, group) (SC,G) pairs. Group information is required since 
externally sourced multicasts in receiving clusters uses (SC, G) entries in their routing tables 


because there is no group aggregation, only sender aggregation at the source cluster. 


4. Control Trees 


There are two types of control trees in the hierarchy. The all-MBR tree connects all 
MBRs and the local registrar of a cluster for a specific level. There are multiple all-MBR 
groups in the network, one for each cluster at a level. The second type is a level n control 
tree which connects all of the registrars for level n with the same parent cluster at level n+1. 
The all-MBR trees are used to pass low frequency cluster-specific information about MBRs 
and group-specific information. The control trees are used to pass level-specific 
information between registrars. We show these trees constructed for a sample network 


which we will use later to describe the operation of CHARM. 


a. Level-specific Information 


All registrars are members of the control tree for their level. This tree is formed 
prior to any multicast groups are formed. The tree is a center-specific tree formed using a 
well known multicast address. The center for this tree is selected administratively. 

The control] tree has two primary purposes. It is used in support of the distributed 
replicated group directory service which the registrars provide as described in section E.2. 
It is also used to initiate a DBR election by way of a source cluster notification which 
includes the MBR list for a new source cluster sent to the other registrars at a level. 

A sample 3-level hierarchy is shown in Figure 3.10. The leaf level, or level 1 registrars 
are members of the leve] 1 contro] tree. Two level 1 control trees are shown since the 


clusters contain two leaf level clusters in each level 2 cluster. A single level 2 control tree 
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connects the level 2 registrars. Note that there is no level 3 registrar at the top of the 


hierarchy. 


Level | 
Control ‘y 
Tree 


Level 2 
Control 
Tree 


_ registrar 


Reg = locations 





Figure 3.10: Control Trees for a 3 Level, 4 Cluster Hierarchy 


b. Cluster-specific Information 

The all-MBR tree is critical to certain functions that must be carried out by MBRs. 
It is also a center-specific tree constructed using a well known multicast address known as 
the all-MBR group address. The center of the all-MBR tree is located at the node which 


serves as the registrar of the cluster. The MBRs register with the registrar which is also a 


center for the all-MBR tree. 
The All-MBR tree serves four main functions: 


¢ Senders in a cluster send their registration message to an MBR on this tree 
(including their first data packet, in case a sender want to send to a group 
without becoming a member) for all MBRs to join to. 


e New source cluster notifications, which include MBR lists, are announced on 
this tree to initiate the DBR election. 


¢ MBR-source cluster distances as part of the DBR election and the winner of the 
DBR election are passed between MBRs on this tree. 


¢ It is used to pass periodic refresh and hand-off information within a cluster 
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between MBRs (see Chapter VI, section B). 

MBRs maintain the information regarding local and external senders in an active 
State table. This state refers to a (source, group), i.e. (S,G), pair or a (source cluster, group), 
i.e. (SC,G), pair which presently exists in an MBR routing table. An active state refers to 
those sources and source clusters which are maintained in the active tables. The active 
tables represent active senders and source clusters that are stored in the event a local 
receiver should join to the MBR. One of the MBRs learns about a new sender from a new 
sender notification unicast to it by the sender. This MBR forwards this notification on the 
all-MBR tree for the other MBRs in the cluster. 

New source cluster notifications are multicast on the all-MBR tree by the 
registrar. These include the source cluster MBR list and trigger a DBR election. The results 
of the individual distances to the source cluster is multicast on the all-MBR tree so that each 
MBR can determine the winner. The MBRs store the (source cluster, group) (SC,G) pair 
information in their active tables until joined by a local receiver or until the source cluster 


DBR send s a leave message. 
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A 3 level hierarchy is shown in Figure 3.11 with the all-MBR trees. The registrars 
are also members of the all-MBR trees at their level. We show four level 1 all-MBR trees 


which connect level 1 MBRs and two level 2 all-MBR trees. 


Level 1 


all-MBR “A 
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@ Level 1 MBR 


@ Level 2 MBR 


© Both Level 1 
and 2 MBR 





Figure 3.11: All-MBR Trees for a 3 Level, 4 Cluster Hierarchy 


5. MBR Functions 


Border routers play a special role in both unicast and multicast. For unicast, the 
purpose behind having border routers stems from the assumption of a “core” backbone of 
the Internet and that various domains were attached to this core usually with a single 
router{41]. The special role they play is explained in References [43], [44] and [45] which 
discuss the inter-domain routing protocols designed for exchanging reachability 
information between border routers. 

The function of the border router is extended for use in ahierarchy. As stated in section 
C.2, a routing domain corresponds to a leaf level cluster. This aligns all leaf level cluster 
boundaries with an existing unicast boundary. It is assumed that the unicast border routers 
perform more functions than an internal router because of the additional requirements that 
they must perform for unicast, such as acting as gateways to other domains running border 


gateway protocols [50]. 
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MBR functions in the hierarchy depend on the type of cluster they are in. We now 


explain these functions for a sending and receiving cluster. 


a. Sending Cluster MBR Functions 


The MBRs in the sending cluster are responsible for specific internal and external 
functions. The external responsibilities are primarily to forward data out to other clusters 
at the same level. The internal responsibilities include joining a group based on a unicast 
new sender registration or information received on the all-MBR tree. A sender sends a 
registration request, with its optional first data packet, to its leaf level MBR. The leaf level 
MBR forwards the packet to its closest MBR at the level of the multicast. This level n MBR 
forwards the multicast on the all-MBR tree and joins the sender. All MBRs who are a point 
of attachment for a DBR in another cluster at this level for this group, also join the sender. 
The result is a source-specific tree rooted at each sender to at least one level n MBR, with 
others joining as required. The MBRs inform attached receivers of the new sender on their 
incoming (source cluster, group) (SC, G) self entry trees. The internal receivers of this group 


then join the sender. 
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Figure 3.12 shows the functions of a source cluster MBR which deals with 


functions of an MBR in acluster with internal senders. 


Source Cluster MBR 


If ( new sender notification arrives ) then 
If ( unicast ) then 
Multicast new sender notification out on all-MBR tree 
Join the sender 
else 


Register Sender in active (5,G) table 
If ( this MBR has local receivers attached ) then 
Send new sender notification message to local receivers 
If ( this MBR has DBR(s) attached for group ) then 
Join the sender (if not already joined) 
Set state on external interface to forward all traffic for group 


I£ (sender leave notification arrives ) then 
If ( unicast ) then 
Multicast sender leave notification out on all-MBR tree 
If ( this MBR has DBR(s) attached for group ) then 
If ( this is the last sender for group ) then 
Send a prune downstream along the group’s interface 
Un-register this sender from the active (S,G) table 


If (local receiver join message arrives ) then 
Set state on internal interface to self entry for incoming tree 
Forward senders for this group from my active (S,G) table 


If ( DBR join message arrives ) then 
Set state on external interface for this group 
Join senders for this group from my active (S,G) table 


25 I£(DBR leave arrives ) then 
26 Remove state from interface 
ee Send leave message to internal sources 





Figure 3.12: Functions of a Source Cluster MBR 
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b. Receiving Cluster MBR Functions 


The multicast border routers in the receiving cluster are responsible for 
forwarding data to local receivers from external source clusters. Figure 3.13 shows the 


functions specifically required by an MBR for external senders. 


Receiving Cluster MBR 


If ( join message arrives from local receiver ) then 
If ( there are DBRs for this group ) then 
Join the DBRs for the group requested 
Set state on internal interface for this group 


If (Jam DBR ) and ( this group not joined to source cluster ) then 
Join source cluster MBR 
Set state on internal interface for this group 


I£ ( leave message arrives from local receiver ) then 
I£ (no more receivers attached ) then 
Remove state from internal interface 
Send leave message to DBR 


If (lam DBR ) and ( this is the last receiver in this group ) then 
Remove state for this group 
Send leave message to source cluster for this group 
Set state in active table source cluster, group pair 





Figure 3.13: Functions of a Receiving Cluster MBR 


F. AN EXAMPLE OF GROUP OPERATION 


In this section, we provide an example of a multicast using CHARM. We start with a 
topology and a group and examine the steps taken to send to receivers in various locations 


to bring out the features of CHARM. 


1. Assumed Infrastructure 


Prior to embarking on an example of a multicast group operation, we make the 
following assumptions about the existence of certain entities and support structures. 


We assume that: 
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e there is a level n registrar working at every level as described above, 
¢ the control trees and all-MBR trees have been set up, 
¢ the MBRs are in place and registered, 


¢ we have a group membership protocol running at the LAN level, similar to 
IGMP [20], and 


¢ end-users know how to get the group name and the level. 
We start with a three level hierarchy on a random topology with 6 nodes in a leaf level 
cluster. The topology is shown in Figure 3.14. In addition, we add one sender and four 
receivers in a global group at level 3. This same topology was used to illustrate the control 


and all-MBR trees in Figures 3.10 and 3.11. 





Figure 3.14: A 3 Level Hierarchy with 4 Leaf Level Clusters 
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It is illustrated here that the diameters of the clusters are related to the path length 


performance of a multicast. The cluster diameters for this example are in Table 3.2. 


Table 3.2: Diameters of the Sample Graph 


[eer [toe [to 





2. Creating a Group 

The group originator’s attached router is responsible for registering the group address 
with the registrar in its leaf level cluster. The scope of the group is determined at creation 
by the originator. The leaf level registrar is responsible for forwarding the request to the 
appropriate level registrar. At this point, the group has no membership, it is scoped, and 
only the local registrar and the registrar at the scope level of the group have information 


about the group. 
3. Sender Join 


a. New group 

For a sender to join a group, once it has received the information from the leaf 
level registrar about the group address and the leaf level MBRs, it sends a new sender 
registration to its closest leaf level MBR. The leaf level MBR forwards the registration to 
its closest level n MBR with the sender’s address and the group address. The level » MBR 
forwards the sender information out on the all-MBR tree and then sends its own join to the 


sender. Each of the other MBRs in this cluster join the sender if they have receivers 
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attached. Otherwise, they store the (source, group) (S,G) pair in their active source table. 
Since this is a new group, no receivers are present. The sender can continue to send data to 
its closest leaf level MBR until it receives the join request from the level n MBR which 
received the unicast registration, at which time, it stops sending to the leaf level MBR and 
multicasts directly on the newly formed tree. 

This result is shown in Figure 3.15. We drop the subscript 3 for now since the only 
group we have is level 3. MBR B received the unicast new sender from the leaf level MBR 
and multicasts on the all-MBR tree which includes MBR A. MBR A stores the ($,G) pair 


information in its active table and B joins the sender. 





Figure 3.15: Sender S13 Joins the Group 


b. Existing Group 
If there are existing members in a cluster for a group, then local cluster receivers 


need to be notified of a new sender. When the sender unicasts its registration to its closest 


52 


leaf level MBR, and then to the level n MBR, the registration and data flow initially on the 
all-MBR tree. The registrar, also a member of the all-MBR tree, sends a new source cluster 
notification on the control tree. This message which contains the MBRs from the source 
cluster, is multicast on the all-MBR tree of the receiving cluster and a DBR election process 
takes place. 

MBRs receive the sender’s information and data and join the new sender if they 
have receivers already attached. If there are receivers for this group in this cluster, they are 
joined the closest MBR. The initial join by a receiver constructs a branch from the MBR to 
the receiver using a Self-entry for a (source cluster, group) (SC,G) pair router table entry. 
The MBR uses this entry to forward new sender information to all receivers joined to it 
along this tree. The receivers then join the new sender directly. 

In our example, when the sender S1 joined the group, a new source notification 


gets sent to cluster C2, with MBRs A and B as part of the message. The receiving cluster 
registrar in C24 passes MBRs A and B to D and E on its all-MBR tree. Both D and E have 
a hop count of 1 and D is chosen because its address is lower. No join ensues since no 
receivers exist yet in C2,, but the (source cluster, group) (SC,G) pair are stored in the active 


source cluster table at D and the DBR, group pair is stored at E. 


4. Receiver Joins 


Using a receiver-initiated join model, the receiver’s join request begins the tree 
construction process. Using either existing explicit join methods, CBT or PIM, a receiver 
propagates a join request towards a sender and its nearest level » MBR in the local cluster. 

The receiver initiates a level n MBR query prior to joining the group. R1’s query, in 
this case, returns the MBRs A and B. RI chooses A as its closest MBR and sends a join for 
this group to A. MBR A constructs a branch back to R1 using, for example, a CBT-like 
method where a join acknowledge is sent back to the requesting node which creates the tree 


branch [1]. The state information for this branch is the self entry (SC,G) pair. A also sends 
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on this branch the senders for this group from its active (S,G) table. R1] can now join a 
sender directly. 

We continue with the example by showing the tree constructed when R1 joins the 
sender at S1. Thus MBR B and R1 are on a single SST rooted at S1. The resulting trees are 


shown in Figure 3.16. 
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Internal Receivers 
and MBRs (S,G) 


Figure 3.16: Receiver R13 Joins the Sender and Closest MBR 


Next, we illustrate how receivers in cluster C2, join the group. The leaf level 
registrars in C3, and C4, return the MBRs D and E to each of the receivers, R2, R3, and 
R4. R2 joins first and chooses D as its closest MBR. D constructs a self-entry branch out 
to R2 and then based on its active (SC,G) table, also acknowledges the active source cluster 
for this group and adds an (SC,G) branch for the source cluster C15. D is now ready to send 
a join to A for this group. A, having an (S,G) entry in its table for the group, is able to know 


to join the sender at S1, grafting a branch on the existing tree constructed by R1. 
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The resulting tree is illustrated in Figure 3.17. In this example, R2 gets a shortest path 


tree to this source. No receiving cluster penalty is incurred because the MBR R2 joined is 


also the DBR for this source cluster. 
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Figure 3.17: Receiver R23 Joins the Source 


The next case shows when a receiver does not join the DBR for the source cluster. 
Receiver R3’s nearest MBR is E so it joins E. MBR E, having an entry in its tables showing 
that D is a DBR for this group, then joins D. The resulting tree is shown in Figure 3.18. In 
this case, the shortest path to the sender without the hierarchy is 4 hops but the path taken 


is 7 hops. This receiver is incurs a penalty in both the sending and receiving clusters. In this 
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instance, it is a | hop penalty in the source cluster and a 2 hop penalty in the receiving 


cluster. 
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Figure 3.18: Receivers R33 and R43 Join their Closest MBRs 


The last receiver to join is R4. R4 also chooses E as its closest MBR. However, on the 
way to joining at E, the branch is grafted. The join message must still propagate to E in case 
there are other source clusters for this group. This is illustrated in Figure 3.18. In this case, 


R4 incurs only the source cluster penalty which is only | hop more than the shortest path. 


a. Join Latency 

One drawback of a receiver-initiated join is latency. Often, this issue 1s related to 
hard state vs. soft state as discussed in Chapter II, section C.2.b. The receiver-initiated, 
explicit tree construction method we describe here is produces hard state for all trees which 
does not time out but rather requires explicit leaves to terminate branches. It should be 


noted that the join latency for a group is only incurred by the first receiver to join the tree. 
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Since we allow for source aggregation at the source cluster, this amounts to the first 


receiver for a group in a receiving cluster. 


5. Sender and Receiver Leaves 


Join latency and leave latency are related. Implicit leaves are addressed in DVMRP in 
Reference [20] and they refer to the practice of allowing branches of the delivery tree to 
time out. The reason that the leave mechanism is important 1s that if a receiver is allowed 
to leave a group passively (i.e. time out) then the branch to the receiver continues to use 
bandwidth unnecessarily for some time after the receiver is gone. 

In the hierarchy, groups with the same source clusters are likely to aggregate 
multicasts on the inter-cluster trees. As a result we require all senders and receivers to 
explicitly leave their groups by following a procedure similar to the join. Senders are 
required to send leave messages to their closest leaf-level MBR as for a new sender 
registration. Local receivers send leave messages directly to the senders. 

Source clusters must terminate their existence explicitly since many senders could be 
in one source cluster. DBRs who have had their source cluster terminate then notify their 
locally attached receivers. 

Receivers send leave requests to the MBR they joined to. If the MBR has had all 
receivers leave, it sends a leave to the DBR(s) it 1s oined to. If a DBR has no more receivers 
or MBRs attached, then it propagates a leave to the source cluster. In this fashion, no 
extraneous branches are left after all group members have terminated their existence in a 


cluster. 


G. ILLUSTRATION OF SCOPE CONTROL 


Using the same sample network, we examine a level 2 multicast. In this case, the 


sender is in C3,, and the receivers are in C3, and C4,. C is chosen as the DBR. 
The MBRs are different for this level, except for B> which also acts as an MBR at level 


3. This shows how for multicasting at different levels of the hierarchy, the data does not 
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follow the logical hierarchy. It is not required for data to go to a level 2 MBR prior to going 
to a level 3 MBR. 

The final result of the tree construction is shown in Figure 3.19. R1 joins S1 directly. 
Both R2 and R3 join the DBR at C even though R2 is an MBR. In this case the penalty 
incurred by R2 is in both the sending and receiving cluster. R3 incurs only a sending cluster 
penalty of 1 hop. The difference between this and levels above is that with scope control 


and clustering, the penalty is reduced because the diameter of the clusters is reduced. 





Figure 3.19: A Level 2 Group where R15, R25, R35 Join S15 


H. PERFORMANCE RELATED ASPECTS 


1. Aggregation 
CHARM allows for source aggregation. If a source cluster has many senders, they are 
treated as one sender outside of the source cluster. This is a savings in routing table entries 


over the source-specific tree which requires individual sender information to be maintained 
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with the group address for a multicast. This is not as good as a center-specific tree which, 
in the absence of group aggregation, achieves the best savings since all senders are 


aggregated on the tree and no sender information is required [1]. 


2. Path Length 


We have pointed out in the examples in this chapter the potential for increased path 
lengths due to the hierarchy. We have shown that the path length penalty is dependent on 
the sender and receiver locations with respect to the MBRs and the DBR election process. 


The path length penalty is discussed in detail in the next chapter. 


3. State Space 


State space is the defined as the amount of information required by a router to support 
the multicast. Large amounts of this information translates into storage requirements and 
potential performance penalties. Much of the processing for routing consists of table look- 
ups, which can be efficient if the table sizes are kept small. CHARM addresses this in two 
ways. First, the storage is saved through source aggregation. Second, through scope 
control, fewer nodes participate in the routing for a multicast. 

State space requirements for the source-specific trees in a source cluster can be high if 
there are many senders in the source cluster. This could be relieved by using a center- 
specific tree in the source cluster with the concomitant path length penalty. State space is 


considered in greater detail in Chapter VI on network overhead. 


4. Scope Control 


One of the primary benefits of a hierarchy for multicast is scope control. Granularity 
of the scope control possible is related to the number of levels of the hierarchy. The deeper 


the hierarchy is the finer the control over the scope of the group. 
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The illustration in Figure 3.20 shows the scope contro] gained by having a “deep” 


hierarchy vs. a “flat” one. In a two level hierarchy, any multicast that is not local is global. 


(a) A “deep” Hierarchy (m > 2) 


(b) A “flat” Hierarchy (m = 2) 
Figure 3.20: A Flat versus Deep Hierarchy 
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IV. PATH LENGTH PERFORMANCE 


A. GENERAL REQUIREMENTS 


A primary performance-related goal of the proposed multicast technique is to have the 
data propagate along the shortest path from any sender to any receiver in the group. The 
path length directly affects the delay as seen by the receiver. This end-to-end delay 
measurement is our primary metric for evaluation of CHARM. 

In analyzing multicast performance, delay is a difficult measurement to quantify 
because it is composed of several other components. They are processing delay, queueing 
delay, transmission delay and propagation delay [5]. Each represents a behavior of the 
network which can be modeled individually, however, models which incorporate all 
become intractable for a large scale network. Path length is directly related to propagation 
delay in that it translates to a geographical distance. Path length, measured in hop counts, 
can also be related to processing and queueing delay since each hop represents some 
processing and queueing at the intermediate nodes. A factor to be considered in modeling 
a complex network is that the link delays are dynamic based on load. In our work, we do 
not concern ourselves with changing network loads and with the overall cost of the tree. 
Thus, we characterize the performance of CHARM in terms of the delay between senders 
and receivers of a group relative to the delay along shortest unicast paths. We refer to 
increased delay due to the hierarchy as the path length penalty. By focussing on the penalty, 
we bound the worst case penalty and show how to improve it. 

The objective of this chapter is to spell out alternative approaches on how to best 
achieve the minimum penalty for a hierarchical multicast. For a cluster-based hierarchy, 
there are three elements that need to be examined, namely, the source cluster, between 
clusters and receiving cluster. We show that in CHARM paths, a deviation from the 
shortest unicast path occurs only inside the source and receiving clusters. The inter-cluster 


path is insured to be the same as the unicast shortest path by CHARM. 
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B. INSIDE SOURCE CLUSTERS 


We assume that a cluster at any level has multiple MBRs. It is undesirable to force a 
single exit point out of a source cluster as we do not make any assumption about the 
connectivity of the clusters at a level, other than that they are connected (see Assumption 
3.3). Therefore, if, two inter-cluster paths for distinct receiving clusters and the same source 
cluster, share a common transit cluster, it is not required that the transit cluster’s exit MBR 
for both the receiving clusters be identical. A single exit point on the transit cluster would 
mean only one of the two receiving clusters can receive data along the shortest path. 
Another reason to do multiple exit points is to help minimize the source cluster penalty. 

We examine three approaches for getting the data out of a source cluster that affect the 
path length for the internal receivers, that is, receivers for the group inside the source 


cluster, differently. 


1. Shared Tree Approach 


The tree along which a sender sends data to its cluster MBRs and its internal receivers 
is referred to as its outgoing tree. A simple and straightforward approach is the use of a 
single shared outgoing tree to which all internal receivers and MBRs of the source cluster 
would join. This is a low router state overhead solution and it requires the advertisement of 
a single center for receivers to join to and senders to send towards. It is an efficient use of 
state space since it is a single outgoing tree per group and only one router table entry per 
group is required. A problem with this solution is that it incurs a potential path length 
penalty to the internal receivers. A second problem is that the MBRs, or the exit points, may 
incur a similar penalty. 

The distance along a center-specific tree from the sender to the MBR at the level of 
the multicast is at most twice the shortest path between two points on this tree [53]. In the 
worst case, the shortest path is the diameter of the cluster. Thus a worst case penalty of one 
cluster diameter is potentially incurred for the MBRs and the internal receivers. This 


penalty to the internal receivers can be eliminated by using source-specific trees rooted at 


64 


each sender, allowing for internal receivers and MBRs to receive the data along the shortest 
path. If this is done, for those members of the group internal to the source cluster, no penalty 


would be incurred. 


2. Broadcast to all MBRs 


In order to determine which MBRs need to join senders, a determination must be made 
about which MBRs will forward data out of the cluster. A simple solution is to have all 
MBRs join all senders. The MBRs then receive data regardless of whether they are exit 
points and a forwarding decision is all that is required to begin forwarding data out. This 
leads to a lower join latency for receivers outside the source cluster. 

A problem with this approach to outgoing trees is a waste of bandwidth and state 
space. If there 1s no need to forward data out of a cluster and the MBR itself is not a 
member, then having the MBR join is wasteful. It is particularly so in a source cluster with 


a large number of senders. An explicit local join, as necessary, is a better method. 


3. Explicit Local Joins 


The decision on whether or not to forward data out of a particular MBR of a source 
cluster needs to be based solely on the presence of receivers in an external cluster that 
request receipt of data out of that MBR. The decision if an MBR needs to forward data out 
is made outside of the source cluster. This method is in keeping with the receiver initiated 
join principle. Once an MBR receives a join from a DBR, it joins all senders in its cluster 
for that group. This is a more efficient use of state space and bandwidth. 

Thus, CHARM requires that internal receivers join local senders’ source-specific 
trees. MBRs join the source specific tree only if a join request is received from outside the 
source cluster. Level n MBRs maintain local sender information in their active (source, 
group) tables (see Chapter III, section E.4.b). Receivers get the list of senders from their 


closest level » MBR when they join. 
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C. BETWEEN CLUSTERS 


The goal for the inter-cluster tree is to guarantee a shortest path between the receiving 
and source clusters. The primary reason for this is the unknown nature of the connectivity 
between two clusters in a large internetwork. The only assumption about the inter-cluster 
topology is that the end-points are MBRs and, given our definition of leaf-level clusters, 
they are domain border routers (see section C.2 in Chapter III). 

We examine three approaches for getting the data between a source cluster MBR and 


a receiving cluster MBR along the shortest path. 


1. Flood and Prune Approach 


A simple solution to insuring that the data arrives at the receiving cluster’s MBRs 
along the shortest path is to use a flood and prune technique. This works well because the 
clusters have multiple MBRs and this insures that the data reaches all of them along the 
shortest path back to the source cluster’s MBRs. This also assumes that all MBRs of the 
source cluster forward all data on all external links. This in turn assumes that all source 
cluster MBRs are joined to every sender inside the source cluster as described in section 
B.2 above. 

This is similar in concept to the approach used in Reference [42] to insure reliable 
delivery of time-critical messages. The fact that the destination cluster may receive 
multiple copies works well for receivers joined to their nearest MBR. No DBR would be 
necessary. In addition, prune messages could be used to reduce the amount of traffic once 
the shortest paths are known. 

The problems with this approach are that the potential for flooding on a global scale is 
unacceptable for global-level multicasts. In addition, the concept of broadcasting to all 
MBRs of a source cluster is not acceptable for reasons stated in section C.2. As chosen in 


PIM and CBT, explicitly constructed paths eliminate these problems. 
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2. Using Cross-cluster Trees 


To build an explicitly constructed source-specific tree, rooted at a source cluster MBR, 
we need an MBR in the receiving cluster to act as the DBR for that receiving cluster. The 
role of the DBR would be to receive multicast data on a per source cluster basis and either 
inject it into the receiving cluster, forward it across the cluster on a cross-cluster tree, or 
both. A cross-cluster tree is constructed by having all MBRs of a receiving cluster join a 
DBR and form a source-specific tree rooted at the DBR. A sample network is shown in 
Figure 4.1 which illustrates the construction of inter-cluster and cross-cluster trees. The 
resulting tree for a single receiving cluster is a point to point shortest path link from the 
source cluster MBR, A in Figure 4.1, to the receiving cluster DBR and a source specific 
tree rooted at the DBR to all MBRs of the receiving cluster. All MBRs in a receiving cluster 


should be on the shortest path back to the MBR of the source cluster. 





Figure 4.1: Inter-cluster and Cross-cluster Trees 


One of the problems with this method is in the construction of the inter-cluster tree. 
When a join message propagates back to a source cluster, it may traverse other clusters, as 
in Figure 4.1 above. If the join encounters a cross-cluster tree in another cluster for the 
desired source cluster, it is forced to graft at that point to the cross-cluster tree. This 
problem is illustrated below. 

For the purposes of illustration, we describe a path as a series of routers. Actually, it 
is an alternating sequence of links and routers but can be specified by the routers alone since 


a link is defined by its end-points, which are routers [23]. To further simplify the 
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description of inter-cluster and cross-cluster paths, we shorten the sequence to include only 
MBRs. 
Assume the following relationships for Figure 4.2: 
CZ, {A,QO}<{B, D} 
Coy (B,D, F, GPS{A,O,E, A} 4B, D, Fy Gra ae Gc. 
{D, F}<diam(C2) 
C4) 1H,1,K53S7.G,5,M > {AIK bs Gai, 
{H,I}<{G,J}<diam(C3) 

The DBR for C2, MBR Q, joins source cluster C1 at A. The cross-cluster tree is formed 
from MBRs E and F to Q as shown. The DBR for C3, MBR G, sends a join for source 
cluster C1 along its shortest path {G,F,D,B}. The join message encounters C2’s cross- 
cluster tree for the same source cluster at MBR F and grafts. The worst case penalty due to 
this graft could be equal to the diameter of C2 (diam(C2)). The cross-cluster tree for C3 is 
then formed from / and J to G. Similarly, the DBR for C4, MBR K, sends a join on what 
would be its shortest path to Cl {A,Q,F,H,/,K} but encounters the cross-cluster tree of C3 
and grafts at MBR /. Again the worst case penalty for this graft could be diam(C3). This 


would result in an inter-cluster penalty of diam(C2) + diam(C3) to every receiver in cluster 





O MBR @® DBR 
Figure 4.2: The Penalties of the Cross-cluster Tree Approach 
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Thus, with cross-cluster trees there is a potential for a diameter penalty for every 
cluster transited. This 1s unacceptable. The solution is the elimination of the cross-cluster 


tree and not forcing a single entry point for a cluster. 


3. Designated Border Routers 


With cross-cluster trees eliminated, explicit joins result in shortest path source- 
specific inter-cluster trees. The result is a forest formed for each source cluster with the 
number of trees based on how many different MBRs of a source cluster get attached to. Join 
requests to a source Cluster are allowed to graft to existing inter-cluster trees. The graft 
would occur to another shortest path tree back to a source cluster MBR, therefore, it would 
not incur any penalty. As a result, we insure that a receiving cluster has a shortest path to 
the source cluster. 

In the example of Figure 4.2, the DBR of C3 at G would not graft but rather construct 
its own tree back to the source cluster MBR at B. The DBR for C4 at K would also not graft 
at J in C3 but would graft at Q in C2. Thus, each cluster would receive its inter-cluster 


traffic along a shortest path. 


D. INSIDE RECEIVING CLUSTERS 


As in the source cluster case, we can have multiple entry points into a cluster, as a 
result of each receiving DBR building a shortest path back to a source cluster. Initially, we 
have a DBR per source cluster, which 1s a single entry point for a receiving cluster. This 
allows us to aggregate senders on a per source cluster basis for a group. 

Receivers do not know who the senders are and should not have to. This allows for 
open groups as described in section A.2 of Chapter II. It also allows the aggregation of 
senders in a source cluster. The simplest solution is to give the receivers one location to go 
to for their data. We need to examine the cost in path length that such a simple solution 
would incur. 

We examine three approaches for getting the data to the receivers of the receiving 


cluster from an external source cluster. 
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1. Join the DBR Only 


One approach is to have the receivers join towards the DBR. This option is the best 
choice for receivers with respect to delay because the receiver would be on a shortest path 
back to the source cluster. 

The problem with this is that there may be many DBRs that a receiver has to join, 
resulting in a potentially large number of joins. This also violates our assumption that the 
receiver knows little about the make-up of the group. It requires knowledge on the part of 
the receiver about the location of senders. The complexity of maintaining multiple joins 1s 
complicated in the case of DBR hand-offs which will be discussed in the next chapter. 

This solution is undesirable due to the amount of knowledge about the group 


constitution a receiver has to maintain. 


2. Join the Closest MBR 


A solution which maintains the simplicity of a single join for the receiver is to have 
the receiver join its closest MBR.The closest MBR is determined after receiving a list of 
MBRs from the registrar as discussed in the previous chapter. This results in a trade-off 
between simplicity and performance since the closest MBR may not be on the shortest path 
back to a source cluster. We examine the penalty that is incurred by this solution in section 
E on page 71. 

An added benefit of this solution is it makes it possible to implement a receiver 
initiated join in the cluster-based hierarchy. An MBR must receive a join message prior to 
joining the appropriate DBRs. This amounts to a reduction in the DBR’s state space. Only 
routing information to those MBRs with receivers attached and any receivers that may have 


joined the DBR directly must be maintained. 


3. Join and Graft 


A variation on the above approach allows for an improvement to the path length 
penalty using the features of an existing protocol [2]. If a receiver’s join encounters an 


existing tree between its closest MBR and a DBR, it grafts onto the tree at that point. The 


70 


join message must still propagate to the MBR in case there is more than one DBR for this 
group and to provide a self entry branch for propagation of new sender notification 
messages. The example in Chapter II] in Figure 3.18 illustrates this concept when R4 joins. 

One other type of graft is permitted. We allow MBRs with receivers attached who find 
themselves on another cluster DBR’s shortest path back to a source cluster to graft onto that 
tree. This is only allowed if the entry for this tree matches the both the source cluster and 
group. If the tree is for another group then a graft is not done, since technically it is a 
different tree. But now that the MBR knows that it has a shorter path back to a source 
cluster, it is allowed to propagate a join request along this path. The MBR continues to 
receive data from its own DBR for the source cluster, group pair until this new branch/tree 
is constructed, after which it prunes itself for this source cluster, group pair from the DBR. 
This will only occur if the MBR has already joined its own DBR, if it has not, then the prune 
will be unnecessary. 

We use the sample graph in Figure 4.3 to illustrate this point. The receiver, r, joins to 
its closest MBR at F. F is on the shortest path for the same group and source cluster along 
{B,D,F,G}. Thus, F grafts to that tree and does not join the DBR for this source cluster. If 


there were other source cluster DBRs, the normal join process would continue., 





OQ MBR @ DBR 
Figure 4.3: Grafting to an Existing Tree 


E. AN UPPER BOUND 


We have now examined the components of a path in CHARM that make up a complete 


path from any sender to any receiver in the network. Receivers inside a source cluster incur 
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no penalty since they use source-specific trees to every internal sender. Inter-cluster trees 
are source-specific trees and as such, no penalty exists between MBRs of source and 
receiving clusters. Potential penalties are incurred only inside the source and receiving 
cluster for a sender-receiver pair from two different clusters. 

We impose a restriction on receiving clusters with the use of a DBR. This results in 
data from a source cluster being limited to potentially one entry point to a receiving cluster 
at the DBR and one exit point at the source cluster at the DBR point of attachment. We 
show how this has the effect of increasing the path length in the source cluster and bound 
that penalty for the worst case. 

We choose the simple solution for the receiver to join its closest MBR. We have stated 
that the potential for a path length penalty now exists. We also show how that penalty 


occurs and bound that penalty for the worst case. 


1. Source Cluster Penalty 


The source cluster penalty is a result of a single path used from a source to a receiving 
cluster for many senders and receivers. The penalty occurs when a DBR joins an MBR of 
a source cluster that is not on the shortest path from a sender to a receiver. This is illustrated 
in Figure 4.4. The difference between the path lengths for the receiving cluster MBR (6) is 
small back to the source cluster. The DBR, C, in this case, chooses A, as its point of 


attachment. The sender is located at B,, and the furthest B, could be from A, is the diameter 


of the source cluster, marked as diam(C1). This is the worst case for a source cluster. All 
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receivers in the receiving cluster pay a penalty of the diameter of the source cluster for this 















sender. 
ie 
OMBR wee receiving 
cluster Cc 
@ DBR cluster 





Figure 4.4: Source Cluster Penalty 


2. Receiving Cluster Penalty 

The receiving cluster penalty is a result of the receiver joining its closest MBR for a 
source cluster. The penalty occurs when the MBR is not on the shortest path back to the 
source cluster, which 1n turn happens when the closest MBR is not the DBR for the source 
cluster. This is illustrated in Figure 4.5. The worst case occurs when the receiver is 
approximately half the cluster diameter away from both the DBR and the MBR it 1s joined 
to and the MBR joined is a diameter away from the DBR. In Figure 4.5, MBR C,, 1s 
diam(C2) away from the DBR at B,,. The receiver joins C,, as it is closer by ¢ than B,. Had 


the receiver chosen B,, it the data would have to traverse 1/2 diam(C2) to get to the receiver. 


a3 


Instead, it traverses | and 1/2 times the diameter. Thus, the worst case penalty is a diameter. 
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Figure 4.5: Receiving Cluster Penalty 


3. Summary 


The worst case penalty is composed of two parts for any sender and receiver pair. It is 
bounded by the sum of the source cluster and receiving cluster diameters. 

The maximum penalty applies at any level, k, in the hierarchy. 

MaxPenalty = diam(C,(s)) + diam(C,(r)) (4.1) 

It is important to note that this penalty is applied to a specific sender and receiver pair, 
not to an entire cluster and not to an entire group. In Chapter VII, we will show that the 
maximum penalty is rarely reached. The more likely case is that some penalty is incurred 
but it is not near the maximum. 

The value of the worst case penalty is dictated by the size of the clusters at their level 
in the hierarchy. The logical consequence 1s that smaller cluster sizes reduce the maximum 
penalty. However, the number of nodes in a cluster does not have a direct relationship with 
the diameter when the network has a regular topology. If a cluster is fully connected then 
the diameter is one hop. The upper bound on the diameter of an n node cluster occurs when 


the nodes are connected linearly. The bound is n-1 hops. One cannot state, with any 
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accuracy, a relationship between the diameter and number of nodes for an irregularly 
connected cluster. 
We next examine methods to reduce the worst case penalty. As expected, any 


improvements come at the cost of some additional infrastructure. 
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V. IMPROVING THE WORST CASE 


We have shown the worst case penalty in the last chapter to be the sum of the source 
and receiving cluster diameters. The makes predicting the worst case difficult as it requires 
knowing the diameters of the clusters. While it might be practical to predict or measure the 
diameter of a leaf level cluster, the diameter of a cluster becomes more of an unknown as 
the scope of the multicast increases and the clusters are at the upper levels of a hierarchy. 

The DBR election process is based on inter-cluster distances. All receivers in a 
receiving cluster pay the source cluster penalty for a sender. If there is only one or a few 
senders in a source cluster, the DBR for a receiving cluster could be attached at an MBR 
on the source cluster which is poor for all or most of these senders. This is due to the 
proximity of the senders to the MBR which is the DBR point of attachment on the source 
cluster. There may be another source cluster MBR with shorter paths to most of the senders 
internally with a distance to the DBR which also happens to be equal to or very near to the 
shortest path chosen by the DBR. Choosing this other MBR on the source cluster would 
reduce the source cluster penalty for all receivers in the receiving cluster.There is, at 
present, no way to know the location of this other MBR. 

In this chapter, we propose methods to improve the worst case receiving cluster 
penalty and to reduce the source cluster penalty. We show how we can reduce the receiving 
cluster worst case to the diameter of a leaf level cluster. While we cannot reduce the worst 
case source cluster penalty for all senders, we introduce a method to reduce the average 
penalty, especially for those cases mentioned above. 

In order to improve on the architecture presented thus far, we need to assume 
additional infrastructure or provide additional information. We look at each case 


individually and comment on the practicality of our proposed solution. 
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A. SOURCE CLUSTER SIDE 


1. Additional Infrastructure 


On the source cluster side, we do not attempt to reduce the worst case source cluster 
penalty. Under this architecture, with the receiving cluster making the decision about which 
MBR of the source cluster to attach to, the potential of realizing the worst case is always 
present. However, we can provide the receiving cluster with additional information about 
the senders inside of the source cluster to guide the DBR election process into making a 
more informed decision. This is particularly helpful in the case of multiple MBRs in both 
the receiving and source clusters which have identical or near identical path lengths. 
Presently, we use a simple tie breaking procedure for the receiving MBR in which the lower 
address becomes the DBR. This proposal injects intelligence into the tie-breaking process. 

The idea is to provide a Measure of Goodness (MOG) for a source cluster MBR. All 
MBRs of a source cluster are aware of the senders within their own cluster. We propose 
that each MBR calculates the distance from itself to each of the senders in its cluster. This 
can be done for senders in each group or senders in all groups in that cluster. Calculation 
of the MOG in a per group fashion increases the MBR overhead for the DBR election in 
that several MOGs must now be computed for each source and sent to the other clusters on 
the control tree. Alternatively, including all senders for all groups generates less traffic on 
the control tree. 

Each MBR averages over the number of senders to arrive at an average distance to 
each sender. Two types of MOG could be defined. For example, the Measure of Goodness 


for an MBR per group could be: 


MOGa a= Sy, d(MBR, s) (5.1) 


Na SES¢ 
where Sz is the set of senders in group G and ISgl is the number of senders in group G. 


Similarly, we define the MOG for all cluster senders to be: 
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MOGv = 54 y d(MBR, s) (5.2) 

séSc 

where Sc 1s the set of senders in the entire cluster, regardless of which group they 
belong to and IS_l 1s the number of senders in a cluster C. 

Once each MBR has computed its MOG, the information is passed to the registrar for 
this level to be stored with the MBR list it already maintains. When a DBR election needs 


to take place and the registrar passes the source cluster MBR address information on the 


all-MBR tree, they also pass a MOG associated with each source cluster MBR. 


2. Use of the MOG to Reduce the Penalty 


To use the MOG, we modify the DBR election process. The modified algorithm is in 
Figure 5.1. The list of source cluster MBRs contains the MOG data. The distance to the 
source cluster MBRs and the MBR’s MOGs are input into some function to compute an 
overall figure of merit (FOM) for a source cluster MBR. Since the lower number is better 
for both the distance to the source cluster and the MOG, the function could be as simple as 
an add. Then the FOM would be the average distance from a receiving cluster MBR to the 


senders in the source cluster. 


Modified DBR Election at a level n MBR 


Receive list of source cluster MBRs and their MOGs on the all-MBR tree 


Calculate distances to source cluster(s) MBRs 


Compute FOM = f(MOG, distance) 


Exchange FOM with other MBRs in my cluster on the All-MBR tree 
If (my FOM 1s lowest) then 
Join the source cluster MBR with lowest FOM 


Notify all other MBRs in my cluster that [am the DBR 





Figure 5.1: Algorithm for DBR Election using MOG 
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We illustrate how the MOG can reduce the penalty using Figure 5.2. We place three 
senders in the source cluster C1. The DBR for cluster C2 chooses MBR A to attach to since 
the inter-cluster distances are equal and A is lower than B. If this is allowed to happen, the 
receivers in cluster C2 incurs a source cluster penalty of 3 hops for both S1 and S2. No 
penalty is incurred for S3. Table 5.1 shows the MOG calculations for this case. MBR B is 
chosen if the MOG is used and the overall penalty to receivers in C2 is reduced. Only the 
sender at S3 incurs a source cluster penalty of 1 hop. This also illustrates that the use of the 
MOG does not eliminate the source cluster penalty and so cannot improve on the worst case 
for any one sender, but it can improve overall performance at the additional cost of 


calculating and storing the MOG. 





Figure 5.2: Sample Clusters to Illustrate use of the MOG 


Table 5.1: MOG Calculation 





3. Practicality 


This proposal is similar in concept to the schemes in Reference [21] for Closest Entry 


Routing (CER) and Overall Best Routing (OBR). These two aggregation schemes used in 
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hierarchical unicast pertain to the amount of internal routing information that would be 
propagated outside of a destination cluster. With CER, no routing information describing 
the internal behavior is propagated outside the destination’s cluster. For OBR, an average 
estimated distance to all nodes in the destination cluster is made available [21]. We turn this 
around and apply it to the sending side and refine it to include an average distance to the 
senders rather than all nodes in a cluster. 

This proposal appears practical from an implementation viewpoint since each MBR 
needs to know where the senders are anyway in case it becomes a point of attachment for 
a DBR. However, knowing the address of a sender and knowing the distance to the sender 
are not the same. The distance to a sender may not be in the MBRs tables which means it 
would have to obtain it, probably from the leaf level MBR. In the case of a cluster high in 
the hierarchy where the MBR and the sender are not near each other, this could become an 
unreasonable burden for an MBR. This is especially true if the number of senders is large. 
It is more likely, therefore, that the group specific MOG, MOG¢, would be used over the 


cluster MOG, MOGc. 


Before we decide if we should use the MOG in the DBR election process, we need to 
examine if it is really necessary. Based on the simulation results (Chapter VII), it appears 
that the maximum source cluster penalty is not incurred too often, if at all. The case of a 
single sender in a cluster with many receivers far away, such as in a distance learning 
environment, appears to be the ideal case for this type of improvement. The option should 


be available, as we said before, on request. 


B. RECEIVING CLUSTER SIDE 


1. Additional Infrastructure 

In this section, we propose a method to reduce the worst case receiving cluster penalty 
to the diameter of a leaf level cluster. In order to do this, we assume additional 
infrastructure in the form of a congruent unicast hierarchy. A congruent unicast hierarchy 


implies that the same hierarchical clustering and routing that we describe for multicast 
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exists for unicast. This is not true in the present Internet. We also assume that the unicast 
hierarchy ensures that routing traverses the hierarchy strictly. All MBRs for every level in 


the hierarchy are co-located with the unicast border routers (UBRs) for the same clusters. 


2. Use of the Unicast Hierarchy to Reduce the Penalty 


In the unicast hierarchy, we assume that the border routers at different levels have a 
similar relationship to border routers in the present Internet. Similar to the way that they are 
used to provide reachability information about members in their domains, they provide 
reachability information about nodes in their clusters [43]. That is, the border routers can 
provide routing information about the nodes in their cluster. Given this hierarchical nature, 
the UBRs provide shortest path information between UBRs at different levels of the 
hierarchy. 

The actions of a receiver join are modified to do the following: 

¢ a receiver joins its closest leaf level MBR, 
¢ a leaf level cluster DBR 1s elected, 


¢ the path this DBR gets from the unicast hierarchy will contain a level nm MBR 
due to the congruent unicast hierarchy. 


¢ this level n MBR would be the DBR if no unicast hierarchy existed. 

Thus, the shortest path in the unicast hierarchy will pass through the MBR that our 
present scheme would choose as the DBR (disregarding the MOG case). This amounts to 
reducing the worst case penalty for the receiving cluster to the diameter of a leaf level 
cluster, which we assume to be small. 

We illustrate this in Figure 5.3. The receiver at R is deep in the hierarchy. Its closest 
level n MBR is at B. Without the congruent unicast hierarchy, R joins B and B joins the 
DBR at A. But with the modifications stated above, R would join its leaf level MBR at D 
and D would join the DBR at the leaf level. In this case, we assume D is the DBR. The 


shortest path to the source cluster that D receives from the unicast hierarchy goes through 
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A, which would have been the level nm DBR for this cluster. The worst case penalty is 


reduced to the diameter of the leaf level cluster containing R and D. 


Source 
Cluster 





Figure 5.3: Reducing the Receiving Cluster Penalty 


C. THE SINGLE SENDER CASE 


We have shown two methods to reduce the penalty in the general case for a group with 
some arbitrary number of senders and receivers. There is one special case which should be 
addressed. This is the case of the single sender to a group with an arbitrary number of 
receivers. Based on the simulation data in Chapter VII, this special case warrants our 
attention. Since we have a single source cluster in this case, we have only one DBR per 
receiving cluster. It does not make sense to pay the receiving cluster penalty for receivers 
who join their closest MBR. If they are allowed to join the DBR for this source cluster, we 
eliminate the receiving cluster penalty. We cannot eliminate the source cluster penalty for 
reasons we have already elaborated on, but if we use the MOG as described above, we can 
approach the shortest path for almost every receiver in the group. 

For this situation we propose a special notification message to the MBRs that this is a 
special group type. The MBRs would then notify the receivers to join the DBR instead. We 
then have only a minimal source cluster penalty, due to MOG, with a shortest path from the 


source cluster to all receivers in the group. 
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D. SUMMARY 

We have examined the two components of the penalty and offered solutions to 
improve the performance in both cases. In one case we can improve the path length 
performance for a receiving cluster by reducing the sending cluster penalty. In the other 
case we can actually tighten the upper bound of the penalty by a significant amount. In the 
third case, we have a solution to a special group make-up which eliminates the receiving 


cluster penalty. 
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VI. NETWORK OVERHEAD 


In this chapter, we discuss network overhead, a general term which refers to 
allocation and usage of the network components and their resources. Costs associated with 
network overhead are bandwidth consumption, router state information and traffic 
concentration [56]. 

We discuss the network overhead for CHARM primarily in terms of generated 
control traffic and router memory requirements resulting from state space required. The 
frequency of message transmission 1s discussed in terms of the expected frequency of 
related events and message sizes are given in terms of the number of items in each message. 


Most of the messages contain a group address (4 bytes for an IP address) as an item. 


A. CONTROL TRAFFIC 


We have introduced two types of control trees and discussed their functional 
responsibilities in Chapter II. We now examine these control structures with respect to the 
volume and frequency of the expected traffic. Level-specific information constitutes the 
primary traffic on the level n control tree while cluster-specific information is passed on the 
all-MBR tree. The reader may wish to refer to Chapter III, section E when reading this 


chapter as the functional details are not be repeated here. 


1. Level n Control Tree 


The level n control tree traffic 1s dependent on the frequency with which groups are 
created/destroyed and the frequency with which senders are added to a group. As described 
in Chapter III, it supports the replicated distributed directory service for group registration. 
The responsibilities of the registrar include passing level-specific group information to 
other registrars physically located in another cluster which has the same parent along the 
tree. The items of information passed are the group address when a group is first created 
and source cluster MBRs for the DBR election process. Al! registrars at a level have group 


address information for all groups in other clusters at their level with the same parent. 
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Group information is passed upon group initiation. Source cluster MBR lists are sent 
when a Sender joins a cluster for the first time. Once a sender appears in a cluster, that 
cluster remains a source cluster as long as there is some sender active in it. It does not need 
to pass the MBR list again unless all senders in the cluster have been terminated. 

In addition, the same cluster information (the MBR list) 1s used for the DBR election 
and it is passed between level n and leaf level registrars. This information is passed in 
response to a unicast query from a leaf level registrar whose return address is included. 
Thus, no traffic flows on the level n control tree when this occurs. 

Since the frequency of this traffic 1s related to the frequency of group and sender 
creation, it depends on the collective group membership change activity at a given level. 

The new source cluster notification 1s of the size of the source cluster MBR list (the 
order of the number of MBRs in a cluster times the size of their address) plus the group 
address and a source cluster ID. This message 1s passed once per cluster at this level unless 


network reconfiguration requires a new DBR election. 


2. AIl-MBR Tree Traffic 


The primary information passed on the all-MBR tree depends upon new sender joins 
regardless of whether they are local or external to a cluster. Either a new source cluster 
notification along the level n control tree or an internal new sender registration from the 
registrar cause traffic on this tree. In response to a new source cluster notification, a DBR 
election may take place. 

Two phases occur in the DBR election which generate traffic on the all-MBR tree. 
First, the registrar injects the source cluster MBR list and second, each MBR passes its 
shortest distance back to the source cluster. The size of the first phase message depends on 
the number of source cluster MBRs. The second phase has as many messages as the number 
of MBRs of the receiving cluster. Each message 1s the size of a local cluster MBR address 


and its associated distance back to the source cluster. 
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A DBR election takes place once for each source cluster. Therefore, the frequency of 
this traffic is expected to be low. 

Internal new source notification is passed on the all-MBR tree in response to a new 
local sender registering with one of the MBRs. This MBR multicasts the new sender 
identification and the related group address on the all-MBR tree. The MBRs then pass the 
information to any local receivers that may be attached for this group. The size of this 
message 1s two IP addresses and the frequency is dependent upon how often new senders 
come into existence after a group has been formed. 

Periodic refresh of routing information and group membership may result in a change 
of DBRs (see section B). The all-MBR tree is used to synchronize the transfer of one DBR 
to another. The registrar is responsible for signalling the start and completion of the hand- 
off using the all-MBR tree. 

The main advantage to having two control tree structures is that local cluster control 
information stays local to the cluster. For groups whose membership changes quickly, the 


impact is limited to the local cluster. 


3. Interaction Between Leaf Level and Level n Registrars 


The registrar at the leaf level registers groups and then passes the presence of this 
group to the appropriate level registrar. The leaf level registrar stores the name and address 
of the group and any senders’ addresses who join. When a leaf level registrar gets a group 
creation request, it sends a unicast message to the registrar at the requested level of the 
hierarchy with a group address request and its own address as a return address. The level 
registrar stores the group address, returns it to the leaf level and forwards it on the control 
ieee 

Additionally, when a new receiver joins a group it queries the leaf level registrar 
for the level n MBR list. The leaf level registrar forwards the request to the level 7 registrar, 
again with its address included. The level m registrar returns the MBR list. The frequency 


of these messages is dependent on the frequency with which new receivers join. However, 
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this information can be cached at the leaf level registrar so that this interaction will occur 
much less frequently. The level n MBR list can be cached since the MBRs are not expected 


to change with the exception of network failures. 


4. Designated BR Election 


The DBR election process is triggered by the appearance of new senders in a cluster 
where there were none previously (see Chapter III, section E.3). It occurs for each new 
receiving cluster, source cluster pair. The maximum number of elections that take place is 
bounded by n(n-1), where 7 is the number of clusters at a level with the same parent. These 
do not occur all at once, rather they take place as new groups and senders are created. In 
addition, DBR choices are expected to remain static, once determined, as all members of 
all groups need to leave a cluster or a periodic DBR refresh must cause a change in the DBR 


occur before the corresponding state is explicitly pruned. 


5. Registration 

Two types of registration are required. First, new senders send to their closest leaf 
level MBR. Second, MBRs register with their level n registrar. 

Senders are required to register for two reasons. First, they identify themselves for 
local receivers and MBRs to join. Second, the registration identifies the presence of a new 
source cluster, if the sender is the first one in the cluster for this group. They must be 
registered so that receiving clusters can join to them. 

The MBR registration allows receivers to get a list of MBRs to choose from prior to 


joining the closest one. This is a static list that is reported once at MBR start-up. 


B. PERIODIC REFRESH AND HAND-OFF 


Network load changes dictate that routes change, new routes get added and old routes 
get dropped. We allow for a periodic refresh of distance information for DBRs which 


includes a periodic check on distances back to the source clusters. If the delay is 
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substantially worse, then anew DBR election process is triggered and a hand-off to another 
MBR is caused by the present DBR’s notification on the all-MBR tree. 

As discussed in section A.2, the all-MBR tree is used to stabilize the hand-off 
procedure by synchronizing the events. If a hand-off 1s required, all MBRs attached to the 


old DBR join the new one before detaching. 


C. STATE SPACE 


State space refers to the amount of information that 1s required to be stored to support 
maintenance of trees in the network. There are three main functional components where 
state is stored: the registrars, the MBRs, and all routers in the network. The amount of state 
stored is dependent on the state maintained per level, the state maintained per group and the 


amount of state maintained per cluster. 


1. Registrar 


The registrar keeps level-specific and cluster-specific information. The purpose of the 


information stored is discussed in Chapter III, section E.2. 


a. Leveln 


Sender information is maintained at the leaf level registrar and passed to the level 
n registrar so that it can be multicast on the all-MBR tree. No sender information is stored 
above the leaf level registrar and no receiver identities are stored anywhere. Group names 
are stored at the level n with the group address. A level n table entry is shown in Figure 6.1. 

The amount of storage required for cluster-related information at each level n 
registrar (n > QO) is the size of a group entry, described in Figure 6.1 times the number of 
groups in that cluster. The source tag field indicates the presence of senders in a group in a 


registrar’s cluster. 


Figure 6.1: A Level n Registrar Group Entry 
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b. Leaf Level 


A leaf level group entry is described in Figure 6.2. The upper bound on the 


amount of storage at the leaf level is the size of an entry times the number of groups. 


Figure 6.2: A Leaf Level Registrar Group Entry 


2. Multicast Routers 


The state required by multicast routers is the routing table information needed to 
properly route a multicast packet. A multicast router (MR) treats multicast routing internal 
to its cluster differently from external cluster routing since sources are aggregated outside 


of a cluster. 


a. Internal Sender State 


Source-specific trees, rooted at senders, forward multicast data to all MBRs for 
the cluster which are points of attachment for a DBR somewhere in the network. In order 
to aggregate the outgoing information from an internal sender, the MBR does not keep 
track of which senders use an external interface. When a DBR joins the MBR, group 
information is passed with the join. The MBR joins the senders as obtained from the local 
level n registrar for this group. The external link is now marked as an outgoing interface for 
this group. For every new join request that a DBR receives, it forwards a group information 
packet to the source cluster MBR so that it joins the appropriate senders. 

The MBR which serves as a point of attachment receives multicast data internally 
which is stored as a (source, group) (5,G) pair in the internal routing tables. It encapsulates 
the packet and inserts its own address and cluster ID as the source information with the 
same group address before forwarding the data out on the external link. The outgoing link 
information is stored as a (source cluster, group) (SC,G) pair. The amount of state required 
to be forwarded on an external link of an MBR is O(Gr), where Gr is the total number of 


groups using this external link. 
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All routing for the source-specific trees within a source cluster is based on the 
internal router table state, which is (S,G) pair information. Thus, for group G inside the 


source cluster, the amount of state required is the sum of the number of senders in that 


group and for all groups in a source cluster, VG ES G: 


b. Internal Receiver State 


The MRs are responsible for forwarding information to receivers for externally 
sourced multicast data. Data is forwarded from some source cluster point of attachment 
with (source cluster, group) pair (SC,G) forwarding information. The amount of state 
required to support this is O(SCxGr) where SC is the total number of source clusters at this 
level with the same parent cluster and Gr is the number of groups at this level. The MBR 
maintains a self (SC,G) entry, where SC is the source cluster the MBR belongs to, to make 


it possible for local groups to forward an internal source notification report on these links. 


c. External Sender State 


The special case of an MR with no internal group members is the external sender 
state. This is the case where the MR belongs to a transit cluster. The routing state 
information kept is no different than the internal receiver state. All routing state outside of 


a source cluster, regardless of the presence of members, is (SC,G) pair information. 


d. MBR Cluster Related Information 


In addition to the routing table entries, the level n MBRs maintain tables for 
source related information. A DBR maintains a group to source cluster mapping table and 
the other MBRs maintain a group to DBR table. An example of an entry in the group to 
source cluster table is shown in Figure 6.3. The amount of storage required for this table 1s 
bound by the number of groups at a level times the number of clusters at this level with the 


Same parent cluster. 


Group Address Source Cluster ID, SC ID» a SC !D; 


Figure 6.3: Level n Group - Source Cluster Mapping 
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An example of an entry of the group to DBR table is shown in Figure 6.4. The 
size of this table is bound by the number of groups at a level times the number of DBRs in 
a cluster. The group to DBR mapping is arranged by group since an MBR may have more 


than 1 DBR per group. 


Source Cluster ID DBR Address | DBR Address | ...] DBR Address 


Figure 6.4: Source Cluster to DBR Mapping 


Two other tables are maintained by MBRs with source related information. They 
are called the “active” tables, the active (S,G) table and the active (SC,G) table. Their 
functions are similar. They keep information about active senders in their own cluster and 
active source clusters outside of their cluster which they are not presently routing. The 
active ($.G) table is used to forward local senders to receivers when they join their closest 
MBR. It is also used 1f an MBR should become a point of attachment for an external cluster 
so the MBR can join the senders. If the MBR does join, the (S,G) pairs are removed from 
the active table and used in the routing table. The only MBRs required to keep active 
(SC,G) pairs are the DBRs for a source cluster which has no receivers in the group. All 
other MBRs keep the DBR, group information mentioned above. When a receiver joins its 
closest MBR, the MBR joins the DBR and the entry is moved from the active table to the 


routing table. 


D. TRAFFIC CONCENTRATION 


Traffic concentration can occur when either multiple groups or multiple senders in a 
group multicast over a few common links. Studies done so far for center-specific trees over 
large groups, show by simulation that certain links may become bottlenecks and in the case 
of a large number of concurrent senders, traffic concentration may occur [56]. 

While there are no center-specific trees for data distribution in CHARM, there are 
shared inter-cluster links. These links are susceptible to traffic concentration. We do not 


force al! cluster traffic to a single entry point in the receiving cluster since we allow for 
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grafting to an inter-cluster tree for the same source cluster and group. However, we do not 


provide for multiple DBRs for a source cluster if the DBR links become congested. 


1. Load Balancing the Inter-Cluster Tree 


We propose a solution to the case where a single MBR becomes the DBR for too many 
source clusters. The load on the links when the MBRs measure their distances back to the 
source cluster may change drastically. This would be especially true if the same MBR 1s 
elected as the DBR for many source clusters. It 1s possible to add a load balancing factor 
into the DBR election process. 

If two MBRs have the same distance to a source cluster, then instead of picking the 
MBR with the lowest address to break the tie, the one with the lowest DBR count is chosen. 
This could also be factored into the cases where the DBR count becomes very high and the 
distance measurement is close. In these cases, the DBR count becomes a factor applied to 


the result to make the election more evenly distributed among the MBRs. 
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Vil. PATH LENGTH RESULTS USING RANDOM GRAPHS 


Simulation of the proposed cluster-based hierarchy is important to verify the upper 
bound for the worst case path length penalty that we have established for CHARM and to 
examine how often this bound is reached, if at all. Towards this end, we have computed 
path length results via simulation for various multi-party interactions using random graphs. 

Delay, measured in this simulation as hop count, is a major performance metric in 
the evaluation of our multicast architecture. Thus, we compare the paths provided by 
CHARM with the paths provided by SSTs for the random topologies generated. Such a 


comparison permits us to determine how frequent and severe the path length penalty is. 


A. METRICS 


The primary metric used to characterize CHARM is path length in hops. The choice 
of this metric is based on the following reasons. Firstly, we are interested in the quality of 
trees in a Static scenario reflecting stable network conditions. Secondly, in the case of a 
lightly loaded network, hop count dominates delay across a path since the transmission 
time becomes negligible and factors such as router processing time and queuing delay, 
which directly depend upon the number of hops, dominate. Therefore, we measure hop 
count which is easy to compute and understand. | 

We note that, for the random topologies used, there is no correspondence between the 
physical distance between two nodes and the number of hops between them. Thus, a hop 
from Monterey to San Francisco is treated equivalent to a hop from San Francisco to New 
York City. In present-day networks, such situations are entirely feasible as long-distance 


leased lines and direct connections between geographically far-away nodes are common. 


B. TOPOLOGY GENERATION 


We simulate several random topologies and variations of groups which map to real- 
world applications that deploy multicast today. The random topologies were generated 


using a graph generation tool which permits generation of a two-level topology consisting 
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of domains at the lower level and interconnections between domains at the higher level. 
The topology is specified by the number of nodes in a domain, the number of domains and 
the node degree of the intra- and inter-domain nodes. The output of the generation tool is a 
list of domain adjacency matrices and inter-domain links. 

The simulator is constructed using object-oriented techniques to reflect a model of the 
Internet. The model consists of objects which represent network entities such as routers, 
routing tables, domains and border routers. In addition, there are several tools which 
operate on this model to generate least cost paths and trace the route between two nodes [4]. 

The model has been deployed for test and evaluation of tree construction 
techniques[51]. The code for the model 1s written in C++, as are all of the tools which 


construct trees on the model and the simulator runs under SunOS 4.1.3. 


1. Graph Generation and Clustering 


Graphs generated for testing purposes are composed of domains or leaf level clusters. 
All leaf level clusters are individually, randomly generated with a node degree of 4. The 
leaf level clusters are connected by inter-cluster links with two distinct node degrees 
between the clusters, one of 4 and another between 8 and 9. The two topology types, 
differing significantly in the inter-domain node degree, permit us to observe the behavior 
of CHARM in richly interconnected inter-domain topologies and validate our conclusions. 

Three graph sizes are constructed from a single 192 node topology. The 192 node 
topology is trimmed down from 24 - 8 node clusters to 16 - 8 node clusters for the 128 node 
case and 12 - 8 node clusters for the 96 node case. In some cases, links have been added 
manually to insure connectivity of the clusters. The purpose behind simulating the different 
sizes obtained as a subset of a larger topology is primarily to show the effect of the depth 
of the hierarchy on the multicast. The scope of the 96 node case, forexample, is equivalent 
to a lower-level scoped group operating in a topology of either the larger 128 node or the 


192 node topologies. It is for this reason that, as far as the path-length performance goes, it 
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is possible to simply focus on the top-level of a hierarchy and investigate only the groups 
operating at the highest level. 

The leaf level clusters are kept at 8 nodes in all the topologies. Note that the node 
degree between 8 and 9 is applied only between the clusters. The network sizes in terms of 
the number of nodes are carefully chosen to permit as many clustering combinations as 
possible. Each cluster combination represents a hierarchy with the corresponding number 
of clusters at the top level. 

Clustering above the leaf level is done manually by drawing boundaries around the 8- 
node leaf level clusters. The clustering scheme for each topology is supplied to the 
simulator by supplying cluster 1d’s and parent-child relationships through an input file. 

The diameter of the clusters 1s, in view of the worst case path length penalty bound, an 
important factor to consider for all the topologies. We note with emphasis the effect of 
clustering on the random graphs we have used before describing the results. Dividing a 
cluster into sub-clusters does not necessarily decrease the diameter. A decrease in the 
diameter when a cluster 1s divided into sub-clusters appears to be a logical consequence 
assumed in Reference [32]. For the topologies we have used, the effect of removal of links 
to isolate a cluster has exactly the opposite effect on the diameter - the clusters can have a 
larger diameter than the diameter of the graph and therefore the maximum path length 
penalty may actually go up. We note here that, to our knowledge, there is no study reported 
in the published literature that established a relationship between the diameter of a graph 
and its number of nodes for random graphs. 

For the random graphs simulated, the sum of the two largest cluster diameters is 


plotted in Figure 7.1. 
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C. SIMULATION SET-UP 


The simulation input is a group of members, their location, and their role in the group 
as a sender or receiver. The clustering used adhered to the following rules established in 
Chapter III. All clusters of a level are connected. The shortest path of a cluster lies within 
that cluster. This is manually enforced in the simulation by artificially inflating inter-cluster 
link costs while calculating the shortest path between two nodes of a cluster. The effect that 
this has on the simulation is that occasionally when a source-specific tree is built by the 
hierarchy, it may not reflect the actual shortest path, which may be via another cluster. The 
result is that there are cases when a penalty is paid even by receivers internal to the source 
cluster which would not normally be. 

The simulation handles static analysis of path lengths. We do not simulate group 
dynamics by way of membership changes. We also do not account for changes in the path 
lengths making the trees, once constructed, static. 

Although queueing delays play a significant part in routing, they are not considered 
here. The queueing delay is typically related to link loads and processing capacity at 
intermediate routers. It is difficult to simulate the relationship between link load, 
processing capability and queueing delay accurately in a simulation without making it 
unmanageably computation-intensive for reasonably large topologies. 

We do not simulate traffic concentration as discussed in Chapter VI, section D as it 


requires simulation of multiple groups, which we do not incorporate. 
D. EXPERIMENTS 


1. The Electronic Classroom 


The electronic classroom is a special case of a single sender to a group of receivers 
spread out across a wide geographic area. It is also Known as distance learning, where one 
location multicasts to a group of receive-only participants. This scenario is simulated to be 
able to isolate a single sender and see the effects of the hierarchy on a randomly selected 


individual sender to a random, moderately sized group of receivers. 


Ze 


The sender selected is the same for all cases reported. There are only 2 or less receivers 
per leaf level cluster. The numbers of receivers for three different size topologies are 24 
receivers for the 96 node network, 32 receivers for the 128 node network, and 48 receivers 
for the 192 node network. 

The results from this scenario reinforce the need for the improvement proposed in 
Chapter V, section C for the special case of a single sender. The reduction of cluster 
diameters reduces the path lengths for most receivers to nearly the shortest path. However, 
there are some receivers that incur receiving cluster penalties that are not necessary if we 
use the proposed improvement. 

In Figure 7.2 through Figure 7.7, we plot the mean, median, and the standard deviation 
of the path lengths obtained from CHARM normalized with respect to the path lengths 
obtained from the source-specific tree as the number of clusters is changed. The general 
pattern in the path length plot (plot (a) in each figure) is that the initial clustering, the 2 
cluster case, does well for path length performance. This is primarily due to the fact that 
half of the receivers get their data along the shortest path. As the number of clusters 
increases, the path length penalty also increases initially. As indicated by Figure 7.1, the 
cluster diameters increase at first before going down. This phenomenon is attributed to the 
loss of links in the process of clustering and is the reason why clustering performs worse 
initially. Although the actual value of the penalty goes down as the number of clusters 
increases sufficiently, the number of receivers that incur a penalty goes up. This is due to 
the fact that, with the increase in the number of clusters, there 1s a greater likelihood that a 
receiver is forced to a path along the hierarchy instead of finding the shortest one to a sender 
within the cluster itself. 

The normalized penalty plot (plot (b) in each figure) shows the worst sender and 
receiver path length out of all senders and receivers in the group divided by the sum of the 
two largest cluster diameters, which is the worst case penalty. We note that the worst case 


penalty given 1s never exceeded in any of the cases. 


Plots (c)-(f) in Figures 7.2 though 7.7 show how the penalty incurred varies between 
the different senders and receivers. We note that, as the number if clusters increases, the 
size of the penalty goes down, but the number of receivers that incur some penalty 
increases. 

The lower node degree between domains leads to a better performance by CHARM. 
This is due to the fact that there are fewer choices for MBRs and fewer paths between 
domains. The result is an increase in the chance that the hierarchy will take the same path 
as the shortest path tree. 

The electronic classroom scenario, being a single sender scenario, cannot be used to 
extrapolate the behavior of CHARM across multiple senders and the likelihood of 
pathological cases is higher. We note however, that, in every topology, for the largest 
number of clusters reported the hierarchy imposed by CHARM provides paths that are 


comparable to the shortest path for all receivers. 
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2. Video Conference 


The video conference scenario is a presently considered to be one of the primary 
applications of multicast. It is the case of multiple members in a group which are both 
senders and receivers with some receive-only participants. This scenario illustrates the 
effect of the hierarchy on a relatively sparse group where all senders are also receivers. 

In the experiments reported, the number of receivers is fixed at twenty-five percent of 
the total number of nodes. The numbers of senders for three different size topologies are 16 
senders for the 96 node network, 24 receivers for the 128 node network, and 36 receivers 
for the 192 node network. 

In every case, the normalized path length approaches the shortest path as the number 
of clusters increases (plot (a)) and no pair of senders and receivers pays the maximum 
penalty (plot (b)). 

The results also show that for the cases of node degree 4, the hierarchy achieves better 
path lengths than it does at the higher node degree. This is because with a lower node 
degree, the paths along the hierarchy are much more likely to be the same as the paths along 
the source specific trees. 

Penalty patterns that are relevant to the effect of the hierarchy are displayed in each of 
the figures Figure 7.8 through Figure 7.13 in plots (c) through (f). The source cluster 
receivers which incur no penalty are seen as a patchwork of solid blue. In some cases where 
we force a sender to receiver path inside of a cluster, a small penalty is paid which disrupts 
the pattern and is indicated by an occasional light blue patch. 

Other noticeable patterns are found in Figure 7.3 (c) and (d). Receiver 14 in the 2 
cluster case and receiver 24 in the 3 cluster case both appear to have a poor choice for their 
closest MBR. This 1s evident in the striated colors indicating penalties. Similar patterns are 
found in other cases for senders. These patterns illustrate clearly that the source cluster and 
the receiving cluster penalties are sender- and receiver-specific. This is expected as the 
penalties are incurred only inside the clusters and are independent of how the source and 


receiver clusters are situated with respect to each other in CHARM. 
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3. Distributed Interactive Simulation 


The most demanding application for multicast is the case of a large group spread out 
over the entire network. This scenario is simulated to show the effect of the hierarchy for 
densely populated groups created by the distributed interactive simulation (DIS) 
community. DIS represents an application that requires widely distributed groups, a large 
number of members, and a large number of concurrent senders [36]. 

In this case, as well as the others, the smaller topologies are obtained by pruning down 
from the 192 node case to simulate the effect of operating at a lower level in the hierarchy. 
We observe in the DIS case, shown in Figures 7.14 through 7.19, that CHARM performs 
consistently well for all sizes of graphs in even a densely populated group scenario. 

In the DIS case, group members make up 50% of the network. The purpose of such a 
high percentage for this scenario 1s to gain the widest cross section of the network and 
expose some pathological cases which return a maximum penalty incurred by some sender/ 
receiver pair. All senders are also receivers and there are 48 members for the 96 node 
network, 64 members for the 128 node network, and 96 members for the 192 node network. 
The result is that in the 192 node, node degree 8.4 case, we see in Figure 7.19 (b) that we 
have one pair that incurs almost 90% of the maximum penalty. 

The cases of individual receivers with a penalty is evident in Figure 7.16 where there 
are several clear light blue and green vertical line patterns which indicate that receivers in 
the different clusters incur the receiving cluster penalty for almost all senders. The same 1s 
true of some senders with yellow horizontal line patterns in Figure 7.14. 

The larger node degree does not have an impact in the DIS case as there are many 
members all over the clusters giving an increased likelihood of getting a good MBR over a 
large group. The connections between the members numbered 50 through 60 do not have 
a good connection with the members numbered 40 through 50 in Figure 7.18(c). These 
members incur a high penalty due to the likelihood of getting an MBR or a DBR far away 


in a two large cluster case. 


115 





Distributed Interactive Simulation 96 nodes (degree 4) — Distnbuted Interactive Simulation 96 nodes (degree 4) 





1 
Pe mean ->o stdev -> x median —> + 
= 2 0.8 
Ea e 
€ a 0.6 
CO. oO 
5 N 
N 0.4 
: 
° = 02 
Fd 
0 
# of clusters # of oe 
(a) b 
Normalized Penalty 2 clusters Normalized Penalty 3 clusters 












45 


40 ee 


aT Tree Te 
rr , 


ttt treo 4 
Per eS TTT ey iy ay fe) 
1 4 

3 el a4 5 


SS oo 3 
Q 30 EEE 3 a 30 
a 25 : . eesecase rT 2 25 | 
d +++ d 
Y) 20) 1 A 20 : 

15| £ : 15 

10 | 10 

5 | 5 

10 20 30 40 10 20 30 40 
ReceiverID 4, max ee 
(Cc) (d) 
2m 0.67 Ss. “hO 
Normalized Penalty 6 clusters Normalized Penalty 12 clusters 


iT Tee TET eT Tg Fy 
he! 
6es 

Tee Se ) 


Oereeesaes 
Sa 


. 
| 
al 
J 
a 
f 
Ld 
+ 


pT E ET PER TT aT 


ge4ga88GRR 
TTTTTTT IT 
eeencans 


10 20 30 40 10 20 30 40 
Receiver ID Receiver ID 
(e) 


Figure 7.14 Distributed Interactive Simulation 96 Node Case Node Degree 4 


116 





Distributed Interactive Simulation 96 nodes (degree 9) _— Distributed Interactive Simulation 96 nodes (degree 9) 


Normalized Path Length 
Normalized Penalty 


mean -—->o stdev -> x median -> + 





2 3 4 6 iz 
# of clusters # of clusters 
(a) (b) 
Normalized Penalty 2 clusters Normalized Penalty 3 clusters 





sess x 
ee cat 
LSVUICE BOYS NAAVAS 





10 20 30 40 


bee ID &% max Receiver ID 


2 ar Oe Cer. 
Normalized Penalty 6 clusters Normalized Penalty 12 clusters 


, 


rE TEE EE 
a Seti a3 TS 


Sender ID 
ee 8 9 OE s 
eS Ty et 


Sh eet ee 





sbevakte 


10 20 30 40 10 20 30 40 
Receiver ID Receiver ID 
(e) f 


Figure 7.15 Distributed Interactive Simulation 96 Node Case Node Degree 9 


las? 





Normalized Path Length 


Sender ID 


Sender ID 


Distributed Interactive Simulation 128 nodes (degree 4) Distributed Interactive Simulation 128 nodes (degree 4) 
2 


median -> + 











> 0.8 
15 = 
S 
G06 
i oO 
N 
‘0.4 
E 
0.5 S 
eGo 
0 0 
Csi 2 4 8 16 
# of clusters # of clusters 
(a) 
Normalized Penalty - 2 clusters Normalized Penalty — 4 clusters 
60 
50 
O 40 
® 
2 30 
M ; 
” c 
20 2 
10 fee 
10 20 30 40 50 60 10 20 30 40 50 60 
ry ID % max i 
U2 .O4 0.6 "0.81.0 


Normalized Penalty — 8 clusters Normalized Penalty - 16 clusters 








60 60 
50 50 
40 O 40 
g 
30 E 30 
w* 
20 20 
ion 10 
10 20 #30 40 #42450 ~°+&60 10 20 30 40 #4250 °#&60 
Receiver ID Receiver ID 
(e) f 


Figure 7.16 Distributed Interactive Simulation 128 Node Case Node Degree 4 


118 





Distributed Interactive Simulation 128 nodes (degree 9) Distributed Interactive Simulation 128 nodes (degree 9) 
2 








Le 
D 0.8 
1.5 = : 
— = 
@ 
= a 06 
a 7 O 
5 | N 
N a 0.4 
3 E 
Zz mean ->0 stdev -> x median -> + 
0 0 i Rae ee 
CST 2 4 8 16 4 8 16 
# of clusters # of clusters 
(a) 
Normalized Penalty — 2 clusters Normalized Penalty - 4 clusters 







Sender ID 
Sender ID 





10 20 30 40 «24560 °&60 10 20 30 40 «560. 60 
ReceiverID =», 14, Ss Reco ID 
(Cc) (d) 
0.2 40.4 O06 0.5 1.0 
Normalized Penalty — 8 clusters Normalized Penalty - 16 clusters 





60 


o 
Seearc’s 
ece7@6er:é 8 
@Ce7-s306°. 2 
Se Gee. e€:8 


aN 
oO 


Sender ID 
Sender |D 





10 20 30 40 50 60 10 20 30 40 50 60 


Receiver ID Receiver ID 
(e) 


Figure 7.17 Distributed Interactive Simulation 128 Node Case Node Degree 9 


ie 





Normalized Path Length 


Sender ID 


Sender ID 


DIS 192 nodes (degree 4) DIS 192 nodes (degree 4) 











a OMA lL. 1 Se Sa Saas ee aa eon 
am © O O 
0.8 
1S 2 a O O = O 
a = 
= ata OQ D 0 
x 4. OF 0:6 O O 
= Oo 0 
1 x ® 
a 
x o 0.4 
. mx E 
0.5 3 
x 08 
mean->o stdev -> x median -> + 
0 a — 
Cole 3 + 6 8 12 24 2 3 4 6 8 12 24 
# of clusters # of clusters 
(a) (b) 
Normalized Penalty 2 clusters Normalized Penalty 4 clusters 


Sender ID 





20 40 60 80 20 40 60 80 


Daa er », maxX a ae 
0.2 0.4.06. 018 1.0 


Normalized Penalty 24 clusters 


Normalized Penalty 8 clusters 





90 


90 

80 80 

70 70 

60 a 60 

90 att it = 30 

(= 

40 H 40 

30 30 

20 20 

10 10 

20 40 60 80 7 20 40 60 80 
Receiver ID Receiver ID 


(e) (f) 
Figure 7.18 Distributed Interactive Simulation 192 Node Case Node Degree 4 


120 





DIS 192 nodes (degree 8.4) DIS 192 nodes (degree 8.4) 


i a ooo . amide rr 


— 


= 
2 2 0.8 O 
Si o 
o 
i a: 0.6 
OS 
a. D 
7 JN 
N @ 0.4 
3 E 
£ oO 
g stdev -> x median -> + O12 





CSl 2 3 4 6 8 12 24 2 3 4 6 8 iz 
# of clusters # of clusters 
(a) (b) 


Normalized Penalty - 4 clusters 


Sender ID 
Sender ID 








20 40 60 80 20 40 60 80 
a De mat A | iN ID 
0.2. U.6"U.6 -O,c8. 150 
Normalized Penalty -— 8 clusters Normalized Penalty — 24 clusters 








90 90 
80 80 
70 70 
a 60 a 60 
3 50% 3 50] 
& { ee a 
0 40 7 40 
30 @ 30 
20 ea 20 i 
10 10 
20 40 60 80 20 40 60 80 
Receiver ID Receiver ID 
(e) (f) 
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E. SUMMARY OF RESULTS 


The results from our simulations show that: 
¢ The hierarchy, depending on the number of clusters chosen, gives path lengths 
approaching the shortest path. 
¢ The worst case penalty bound is never exceeded. 


¢ As the number of clusters goes up, a larger percentage of receivers incur a 
penalty, but the size of the penalty is smaller. 


¢ For the random topologies generated, clustering does not reduce the diameter 
initially. This is reflected in the initial increase of path length in most cases. 


¢ For randomly chosen centers, clustering performance is initially comparable 
to center-specific trees. It becomes superior to CSTs as the number of clusters 
goes up. 


¢ When clustering is used, we force all paths inside of the cluster. Although we 
artificially impose this only for the clustered routing, we do not see this forced 
routing often. However, when we do force this, it results in a penalty inside the 
source cluster. 


In the case of the electronic classroom, even though the center forms a shortest path 
tree to all receivers, clustering performance compares well. 

For video conferencing, the results showed the sending and receiving cluster penalties 
clearly. The hierarchy performed well, approaching the shortest path in all cases as the 
cluster sizes got smaller. 

For DIS, as we have many nodes participating, the number of nodes with a penalty 
increases. However, the penalty in all cases drops off substantially when there are more 
than 6 clusters. Even though we have more sender/receiver pairs with close to the worst 
case penalty, the number is still small relative to the size of the group. 

In none of the simulations, is the worst case penalty reached. This shows that the 
likelihood of both a sender and receiver being in the worst case scenario is small. The more 


likely case is that both incur a penalty which is frequently much less than the worst case. 
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VII. CONCLUDING REMARKS 


A. CONCLUSIONS 


The focus of the work presented in this dissertation is a new cluster-based hierarchical 
architecture for multicasting in internetworks and the investigation of its performance in 
terms of the path length metric. The assumptions behind this work are that all routers in the 
network are multicast-capable, an underlying two level hierarchy of domains and border 
routers exists, and that these border routers either have or can get the distance to each other. 
The proven and field-tested concept of a receiver-initiated join is maintained in the 
hierarchy. 

The primary motivation behind the development of this architecture is the lack of 
scope control and scalability features in existing techniques. We have deployed a 
hierarchical organization of components such as the registrars and multicast border routers 
in a manner that maintains physical routing of multicast data along the shortest path 
between the leaf level clusters of the hierarchy. 

We summarize the essential features of the hierarchical multicast architecture 
described in this thesis as below. 


¢ Scope Control: The logical hierarchy controls the flow of multicast data 
through the use of clustering and border routers without sacrificing the benefits 
of a flat multicast address. In controlling the scope of the traffic, the network 
overhead is reduced by restricting the traffic from going to destinations where 
it is not required to go. 


¢ Performance. The worst case path length penalty is bounded by the sum of the 
sending and receiving cluster diameters. Given hierarchical unicast and 
collection of a measure of goodness by the border routers, the worst case 
penalty can be reduced to the diameter of the leaf level clusters. 


¢ Operation: Senders form shortest path trees to all receivers in their own source 
cluster. Receivers, with external sources, join towards their nearest border 
router at the level of the group. Cluster border routers, in receiving clusters, 
choose between themselves which one will act as the designated border router 
for a source cluster. Designated border routers join back to the source cluster 


ls 


forming a source-based shortest path tree. Border routers with receivers 
attached join towards the designated border routers. Border routers in a source 
cluster who have been chosen as the source cluster point of attachment for a 
designated border router, join the actual senders 1n their cluster. 


¢ Scalability: This architecture provides for sender aggregation as designated 
border routers are chosen on a per source cluster basis, rather than per sender 
or group basis. It eliminates the need to locate centers at run-time either as 
destinations to join towards or collect sender information from. This is done by 
configuring a hierarchy into the network administratively. 


We have provided an architecture that is capable of deploying protocols such as PIM 
or CBT in each leaf level cluster. By organizing the information about the hierarchy and 
presence of senders with registrars that are also organized hierarchically, we separate the 


resource discovery aspect of multicast completely from the tree construction aspect. 


B. FUTURE WORK 


We have shown that the proposed hierarchy performs well with respect to the path 
length and has other useful properties. Now, several aspects of multicasting in datagram 
internetworks in general, and CHARM in particular, can be pursued further. 

The operation of the different components needs to specified in formal terms using a 
specification tools such as Lundy’s in Reference [34] to make the interaction and 
synchronization between the different components concrete. This will enable identification 
of the synchronization requirements in situations such as DBR hand-off, DBR election, 
level n to leaf level registrar exchanges, etc. Such a specification is the logical next step 
towards an implementation of this architecture via a set of protocols. Similar work is also 
required for the architecture of the registrar hierarchy outlined here. 

The schemes proposed for improving the worst case penalties by deploying a unicast 
hierarchy and introducing a measure of goodness collection by multicast border routers 
need to be evaluated in terms of their impact on the path length performance. A suitable 
measure for the goodness of a border router with respect to its internal senders needs to be 


identified and evaluated. 
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By choosing the clusters in a source-specific manner, we have performed aggregation 
of senders in a cluster. We note that the inter-cluster trees are built rooted at the source 
cluster. This potentially permits identification of clusters that have a common set of groups 
in operation. The traffic for these groups can be aggregated on the inter-cluster links to 
further save the state maintained by cluster border routers. This needs a mechanism for a 
cluster border router to join or leave an aggregated inter-cluster tree. Such a mechanism 
needs to be developed and evaluated for CHARM. 

We have bound the scope of a group to a particular level of the hierarchy at group 
creation time. For dynamic groups, it 1s entirely possible that the anticipated level of the 
group changes during the group operation either due to all members in certain parts of the 
hierarchy leaving the interaction or due to completely new parts of the hierarchy attempting 
to join the interaction. In the present architecture, the latter scenario is not possible at all 
and the former scenario leads to the group remaining operational at a level higher than 
necessary. The ability to handle these situations requires addition of a mechanism for run- 
time scope level modification. 

Finally, the performance of CHARM needs to be investigated with respect to group 
dynamics, traffic concentration, and control traffic bandwidth consumption using an 


analysis tool that generates various topologies, groups, and member behaviors. 
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APPENDIX A. SAMPLE INPUT FILE USED FOR CHARM 
SIMULATIONS 


The following file is a sample input file for the 128 node video conference scenario, 
with node degree 4. The internetwork is defined as a series of domain adjacency tables. 
These are then connected using the Border Group Interdomain Adjacency Table. The 
number of links in this table determines the inter-domain node degree. Group information 
includes the number, location and expected role of the members. The nodes are identified 
using a 2 place decimal notation signifying their domain and their ID in that domain. 

The hierarchy information in the input file starts with the tag “%%Internetwork 
Mulitcast Hierarchy%%.” The input file shown is set up to run four different topologies 
each with a different clustering. The hierarchy has 3 levels, the domains are at level 1, the 
clusters of domains are at level 2 and the entire network is at level 3. The 
“% %oClusters%%” tag indicates the number of clusters in each level above 1. The number 
of domains is the number of level 1 clusters. The “%%Parent Cluster Definitions%%”’ tag 
indicates to the simulator how the level 1 to level m-1 clusters are to be clustered by 
assigning them to a parent cluster ID. All parent cluster IDs at level m-1 should be 1 to be 
in the same network. 

The simulator reads in this file and simulates the proper level in the hierarchy based 


on where the members are. 


//This file contains internetwork data for use with mast.exe 
S%MAST Topology Data%% 

%%Number of Domains%% 

16 


%%Domain Adjacency Data%% 


tAdjacency Table 1 S% 


2OrFOOO & ow 
eg fears. 
ee 
OrFODOR 
es eS 
ee so 
ps ete 
ees 
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oe 


oe 


Adjacency Table 2 
Adjacency Table 3 


oe 


go 


S%Adjacency Table 4 


or 


oo 


S6sAdjacency Table 5 


© 
oe © 
oe 
© 
LO < 
v oa 
a 
© ~ 
T 
Fy = 
Pe ~ 
UO 
c rc 
Vv 
Y ~ 
© 
tay = 
TO 
KL . 
ae 
ac CO © 
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S% 


S%Adjacency Table 7 


$stAdjacency Table 8 


6% 


ao 
oo 


$sAdjacency Table 9 


S% 


S%tAdjacency Table 10 
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de Oo 


Adjacency Table 11 


oo 


S%tAdjacency Table 12 


oo 
oo 


$%Adjacency Table 13 


SS 


StAdjacency Table 14 


%tAdiacency Table 15 


oe 
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°O 
: 
. 
6 
@ 
1 
a 
i 


$%tAdjacency Table 16 $% 


OrPOrFPPRPOrF Oo 
a Serene be 
Ssiceonsee oO 
fr esa rtad Fe 
Fe ee 
Oro se So 
OOF ane OOP 
O° Sears eS 


$%Border Group Interdomain Adjacency Table%$% 
21 


meee. 2, 2, 1; 1 
meee So, 1 yl; 1 
meee, Ol Ol lL, 
fm 6 , 4, 1 , 1 
Meee; ll, 5S, Ly 1 
ieee & , 14, 2, 1, 1 
ees, 4°, 4 , Lo, 1 
Pee’), 4, 2,1, 1 
Sees Oe, 5 7 fly 1 
temeeoe, LO, 2, 1, 1 
mow, 2 5 2 >, 2. , 1 
Wee 2, 2h Ll YP 


14,8 , 11, 4 


— 
i 


Bee, 2 3, LG 
Mee. 7 ig oe | Ly J 
“eee Oy 4, 1; 1 
(ee 2 ee 2 5 ly 1 
eile to Ss Ly 1 
mee oe ae SO ge ky 1 
eee os Oo ge Sey Ld Go 
ie 2,12), 5; I, 1 
Belo Oo, gs 2 


//This Network topology has 
//a tolal calculated Node Degree of: 4.5 


S$SMulticast Groups%% 
4 
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S6%Group 1%% 
S$Senders%% 
24 


3 


, 


O09 DO WANAHAAHN MN BW NH 
NrPAN POP Od BH ~) W WM PO 


Tons, 
dele 
al ae 
ee 
eae 
ae 
oo 
Le vs 
cae, 
Crs 


Em OF WDY OF ~DTN WM 


$SREcelVeErssS 
32 


ra 


s 


OW DWAIN AAHA YN FP B WW dH NY 
NrPwAN PW OeP OD YB ~) DM WNW WM by 
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tb. Sy 
i, 
eZ; 
Sg 
lea 
a, 
14, 
eae 
lea 
ro ; 
LEG &, 


@ mo oO = hoes 


SéInternetwork Mulitcast Hierarchy%% 
SETOPDOLOgGYyY 1%% 


E%Levelst$ 
5 


$%Clusters%% 

Z 

1 

$%Parent Cluster Definitions%% 

ee eee Le, ek , Lyk yy 2 
fee: 

$sInternetwork Mulitcast Hierarchy%% 
SSTOPOLOGY 2%% 


%4Levelst% 


lJ 


$S$Clusters%% 

4 

ul 

Parent Cluster Definitions%% 

ieee eee eee ee. oy SC, 
ee ae 

S$Internetwork Mulitcast Hierarchy%% 


SSTOPOLlOgy 3%% 


%$4Levels%% 


LJ 


133 


$Clusters%% 
8 
1 


%%Parent Cluster Definitions? 
1,1, 2 462 4, 3 4 3 a © ya Sr oe ee 
1 , al , ib , ii , al , 1 ? il ? ll 


S$S$Internetwork Mulitcast Hierarchy%% 
SETOPOlOgy 4%% 


%%Levels%% 
3 


S$éClusters%% 

6 

iE 

$$6Parent Cluster Definitions%$% 

1,2 7 3%, 4.5, 5-, 6 gl 4 8 4 9 TO ee ee ee 
De ig Sp 2 el ee 
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11. 


12. 
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